Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

2012-10-04 Thread Steven Bosscher
On Thu, Oct 4, 2012 at 8:57 AM, Steven Bosscher  wrote:
> On Thu, Oct 4, 2012 at 5:30 AM, Vladimir Makarov  wrote:
>> I was going to look at this code too but I was interesting in generation of
>> less points and live ranges.  It is strange that in my profiles,
>> remove_some_program_points_and_update_live_ranges takes 0.6% of compiler
>> time on these huge tests.   So I was not interesting to speed up the
>> function and may be therefore you have no visible change in compilation
>> time.
>
> Right. The compression algorithm doesn't care much about the initial
> number of program points, only about the number of live ranges before
> and after compression. I had expected a bigger effect on the number of
> live ranges before compression.
>
> 0.6% sounds really very different from my timings. How much time does
> create_start_finish_chains take for you?
>
>
>> I don't object the idea of the patch.  I need some time to look at it (the
>> different results on a function is a bit scary for me) and check simulator
>> times on other tests.
>
> Understood.

BTW, it would be great if you can also look at this additional patch hunk:

@@ -994,8 +1044,8 @@ lra_create_live_ranges (bool all_p)
   curr_point = 0;
   point_freq_vec = VEC_alloc (int, heap, get_max_uid () * 2);
   lra_point_freq = VEC_address (int, point_freq_vec);
-  FOR_EACH_BB (bb)
-process_bb_lives (bb);
+  FOR_EACH_BB_REVERSE (bb)
+process_bb_lives (bb, curr_point);
   lra_live_max_point = curr_point;
   create_start_finish_chains ();
   if (lra_dump_file != NULL)

I think this should result in more live ranges being merged. Here's
why I think so, based on my far worse understanding of this code than
yours, so forgive me if I'm Completely Wrong :-)

process_bb_lives walks insns in the basic block from last to first, so
say you have a basic block chain 1->2->3, and each block has 4 insns,
then AFAIU the program points in block 1 will be [4,3,2,1], in block 2
it will be [8,7,6,5], and in block 3 it will be [12,11,10,9]. Say a
reg is used in block 3 at point 11, and set in block at point 3. Then
this reg will have a live range chain [3-1],[8-5],[12-11].

If you visit the basic blocks in reverse order, the program points
will be: 1:[12,11,10,9], 2:[8,7,6,5], 3:[4,3,2,1]. Now the same reg
will be set at point 11 and used at point 3, and the live range chain
will be just [11-3].

I'm experimenting with this extra hunk and report back here.

Ciao!
Steven


Re: [PATCH] Improve debug info for partial inlining (PR debug/54519, take 2)

2012-10-04 Thread Jakub Jelinek
On Thu, Oct 04, 2012 at 01:31:08AM -0300, Alexandre Oliva wrote:
> On Oct  3, 2012, Jakub Jelinek  wrote:
> 
> > basically there is a non-addressable parameter in stack slot, and
> > vt_canon_true_dep -> canon_true_dependence thinks an argument push
> > insn might alias with it, because it doesn't have a MEM_EXPR and
> > ao_ref_from_mem fails.
> 
> I have a pending (still unreviewed) patch that might address :-), so to
> speak, this problem.

This is PR54796 now btw.

> for  gcc/ChangeLog
> from  Alexandre Oliva  
> 
>   PR debug/53671
>   PR debug/49888
>   * var-tracking.c (attrs_list_by_loc_eq): New.
>   (track_stack_pointer): New.
>   (dataflow_set_merge): Use it.
>   (vt_initialize): Record the initial stack pointer in the
>   dataflow set.

Unfortunately this patch makes this significantly worse, not better.
The change from stock gcc for pr54796.c testcase (or for pr54519-3.c
testcase with the two PR54519 patches) to one with this patch applied
additionally is (with -m32 -Os -g) that argp based var_locations are
gone, replaced by sp based locations, which change on every sp change.
That results in bigger debug info, as for argp we use the compact
DW_OP_fbreg, while otherwise we end up with longish location lists.
And the clobber_overlapping_mems resets are the same as before on
the testcases.

What I meant in the PR54796 comments is something like this untested patch.
Unfortunately it needs work, on the two testcases it seems to unfortunately
results in more clobber_overlapping_mems resets in the prologues (before hfp
is set), then fewer resets in the rest of the function (so, in pr54796.c the
net effect is partial improvement (previously none of a, b, c vars at line
17 could be printed, now a and c can, b was reset in the prologue and
can't), on pr54519-3.c y and z in the second case are still unavailable, but
newly also unavailable in the first case where they were previously
available).

--- gcc/cselib.h.jj 2012-03-06 17:02:12.0 +0100
+++ gcc/cselib.h2012-10-04 08:42:04.879335722 +0200
@@ -99,6 +99,8 @@ extern void cselib_preserve_only_values
 extern void cselib_preserve_cfa_base_value (cselib_val *, unsigned int);
 extern void cselib_add_permanent_equiv (cselib_val *, rtx, rtx);
 extern bool cselib_have_permanent_equivalences (void);
+extern void cselib_set_value_sp_based (cselib_val *);
+extern bool cselib_sp_based_value_p (cselib_val *);
 
 extern void dump_cselib_table (FILE *);
 
--- gcc/alias.c.jj  2012-09-12 10:57:03.0 +0200
+++ gcc/alias.c 2012-10-04 08:45:22.893221350 +0200
@@ -1641,6 +1641,9 @@ find_base_term (rtx x)
   if (!val)
return ret;
 
+  if (cselib_sp_based_value_p (val))
+   return static_reg_base_value[STACK_POINTER_REGNUM];
+
   f = val->locs;
   /* Temporarily reset val->locs to avoid infinite recursion.  */
   val->locs = NULL;
--- gcc/cselib.c.jj 2012-08-24 23:44:03.0 +0200
+++ gcc/cselib.c2012-10-04 08:45:12.722278664 +0200
@@ -210,6 +210,9 @@ void (*cselib_record_sets_hook) (rtx ins
 #define PRESERVED_VALUE_P(RTX) \
   (RTL_FLAG_CHECK1("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
 
+#define SP_BASED_VALUE_P(RTX) \
+  (RTL_FLAG_CHECK1("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
+
 
 
 /* Allocate a struct elt_list and fill in its two elements with the
@@ -739,6 +742,23 @@ cselib_preserve_only_values (void)
   gcc_assert (first_containing_mem == &dummy_val);
 }
 
+/* Arrange for a value to be marked as based on stack pointer
+   for find_base_term purposes.  */
+
+void
+cselib_set_value_sp_based (cselib_val *v)
+{
+  SP_BASED_VALUE_P (v->val_rtx) = 1;
+}
+
+/* Test whether a value is preserved.  */
+
+bool
+cselib_sp_based_value_p (cselib_val *v)
+{
+  return SP_BASED_VALUE_P (v->val_rtx);
+}
+
 /* Return the mode in which a register was last set.  If X is not a
register, return its mode.  If the mode in which the register was
set is not known, or the value was already clobbered, return
--- gcc/var-tracking.c.jj   2012-10-03 09:01:36.297902370 +0200
+++ gcc/var-tracking.c  2012-10-04 09:05:58.294277166 +0200
@@ -5769,6 +5769,11 @@ add_stores (rtx loc, const_rtx expr, voi
 
   resolve = preserve = !cselib_preserved_value_p (v);
 
+  if (loc == stack_pointer_rtx
+  && hard_frame_pointer_adjustment != -1
+  && preserve)
+cselib_set_value_sp_based (v);
+
   nloc = replace_expr_with_values (oloc);
   if (nloc)
 oloc = nloc;
--- gcc/rtl.h.jj2012-09-28 14:16:57.0 +0200
+++ gcc/rtl.h   2012-09-28 14:16:57.0 +0200
@@ -267,7 +267,8 @@ struct GTY((chain_next ("RTX_NEXT (&%h)"
  1 in a CALL_INSN if it is a sibling call.
  1 in a SET that is for a return.
  In a CODE_LABEL, part of the two-bit alternate entry field.
- 1 in a CONCAT is VAL_EXPR_IS_COPIED in var-tracking.c.  */
+ 1 in a CONCAT is VAL_EXPR_IS_COPIED in var-tracking.c.
+ 1 in a VALUE is SP_BASED_VALUE_P in cselib.c.  */
   unsigned int j

Re: Convert more non-GTY htab_t to hash_table.

2012-10-04 Thread Richard Guenther
On Tue, 2 Oct 2012, Lawrence Crowl wrote:

> On 10/2/12, Richard Guenther  wrote:
> > On Mon, 1 Oct 2012, Lawrence Crowl wrote:
> > > Change more non-GTY hash tables to use the new type-safe
> > > template hash table.  Constify member function parameters that
> > > can be const.  Correct a couple of expressions in formerly
> > > uninstantiated templates.
> > >
> > > The new code is 0.362% faster in bootstrap, with a 99.5%
> > > confidence of being faster.
> > >
> > > Tested on x86-64.
> > >
> > > Okay for trunk?
> >
> > You are changing a hashtable used by fold checking, did you test
> > with fold checking enabled?
> 
> I didn't know I had to do anything beyond the normal make check.
> What do I do?
> 
> > +/* Data structures used to maintain mapping between basic blocks and
> > +   copies.  */
> > +static hash_table  bb_original;
> > +static hash_table  bb_copy;
> >
> > note that because hash_table has a constructor we now get global
> > CTORs for all statics :( (and mx-protected local inits ...)
> 
> The overhead for the global constructors isn't significant.
> Only the function-local statics have mx-protection, and that can
> be eliminated by making them global static.
> 
> > Can you please try to remove the constructor from hash_table to
> > avoid this overhead?  (as a followup - that is, don't initialize
> > htab)
> 
> The initialization avoids potential errors in calling dispose.
> I can do it, but I don't think the overhead (after moving the
> function-local statics to global) will matter, and so I prefer to
> keep the safety.  So is the move of the statics sufficient or do
> you still want to remove constructors?

Hm, having them in-scope where they are used is good style.
Why can't they be statically initialized and put in .data?
Please make it so - you know C++ enough (ISTR value-initialization
is default - which means NULL for the pointer?)

Richard.

> > The cfg.c, dse.c and hash-table.h parts are ok for trunk, I'll
> > leave the rest to respective maintainers of the pieces of the
> > compiler.
> >
> > Thanks,
> > Richard.
> >
> >>
> >> Index: gcc/java/ChangeLog
> >>
> >> 2012-10-01  Lawrence Crowl  
> >>
> >>* Make-lang.in (JAVA_OBJS): Add dependence on hash-table.o.
> >>(JCFDUMP_OBJS): Add dependence on hash-table.o.
> >>(jcf-io.o): Add dependence on hash-table.h.
> >>* jcf-io.c (memoized_class_lookups): Change to use type-safe hash table.
> >>
> >> Index: gcc/c/ChangeLog
> >>
> >> 2012-10-01  Lawrence Crowl  
> >>
> >>* Make-lang.in (c-decl.o): Add dependence on hash-table.h.
> >>* c-decl.c (detect_field_duplicates_hash): Change to new type-safe
> >>hash table.
> >>
> >> Index: gcc/objc/ChangeLog
> >>
> >> 2012-10-01  Lawrence Crowl  
> >>
> >>* Make-lang.in (OBJC_OBJS): Add dependence on hash-table.o.
> >>(objc-act.o): Add dependence on hash-table.h.
> >>* objc-act.c (objc_detect_field_duplicates): Change to new type-safe
> >>hash table.
> >>
> >> Index: gcc/ChangeLog
> >>
> >> 2012-10-01  Lawrence Crowl  
> >>
> >>* Makefile.in (fold-const.o): Add depencence on hash-table.h.
> >>(dse.o): Likewise.
> >>(cfg.o): Likewise.
> >>* fold-const.c (fold_checksum_tree): Change to new type-safe hash table.
> >>* (print_fold_checksum): Likewise.
> >>* cfg.c (var bb_original): Likewise.
> >>* (var bb_copy): Likewise.
> >>* (var loop_copy): Likewise.
> >>* hash-table.h (template hash_table): Constify parameters for find...
> >>and remove_elt... member functions.
> >> (hash_table::empty) Correct size expression.
> >> (hash_table::clear_slot) Correct deleted entry assignment.
> >>* dse.c (var rtx_group_table): Change to new type-safe hash table.
> >>
> >> Index: gcc/cp/ChangeLog
> >>
> >> 2012-10-01  Lawrence Crowl  
> >>
> >>* Make-lang.in (class.o): Add dependence on hash-table.h.
> >>(tree.o): Likewise.
> >>(semantics.o): Likewise.
> >>* class.c (fixed_type_or_null): Change to new type-safe hash table.
> >>* tree.c (verify_stmt_tree): Likewise.
> >>(verify_stmt_tree_r): Likewise.
> >>* semantics.c (struct nrv_data): Likewise.
> >>
> >>
> >> Index: gcc/java/Make-lang.in
> >> ===
> >> --- gcc/java/Make-lang.in  (revision 191941)
> >> +++ gcc/java/Make-lang.in  (working copy)
> >> @@ -83,10 +83,10 @@ JAVA_OBJS = java/class.o java/decl.o jav
> >>java/zextract.o java/jcf-io.o java/win32-host.o java/jcf-parse.o
> >> java/mangle.o \
> >>java/mangle_name.o java/builtins.o java/resource.o \
> >>java/jcf-depend.o \
> >> -  java/jcf-path.o java/boehm.o java/java-gimplify.o
> >> +  java/jcf-path.o java/boehm.o java/java-gimplify.o hash-table.o
> >>
> >>  JCFDUMP_OBJS = java/jcf-dump.o java/jcf-io.o java/jcf-depend.o
> >> java/jcf-path.o \
> >> -  java/win32-host.o java/zextract.o ggc-none.o
> >> +  java/win32-host.o java/zextract.o ggc-none.o hash-table.o
> >>
> >>  JVGENMAIN_OBJS = java/jvgenm

Re: [PATCH] Fix up DW_TAG_formal_parameter placement

2012-10-04 Thread Richard Guenther
On Wed, 3 Oct 2012, Jakub Jelinek wrote:

> Hi!
> 
> With the PR54519 patch I've just posted, I've noticed, I've noticed on the
> same testcase from yesterday's IRC:
> static inline void foo (int x, int y) { asm volatile ("nop"); }
> static inline void bar (int z) { foo (z, 0); foo (z, 1); }
> int main ()
> {
>   bar (0);
>   bar (1);
>   return 0;
> }
> that while I can print x and y just fine, if I do bt, x, y and z printed
> in the backtrace are all optimized out.
> The problem is that first tree versioning for foo.isra.* or bar.isra.*
> deposits the optimized away parameters as VAR_DECLs into the DECL_INITIAL
> block (which is fine), but then during inlining they end up in the remapped
> block of DECL_INITIAL, not the new block added above it into which inliner
> puts parameters.  So in the debug info we have
> DW_TAG_inlined_subroutine
>   DW_TAG_formal_parameter for non-optimized away parameters
>   DW_TAG_lexical_block
> DW_TAG_formal_parameter for optimized away parameters
> and the debugger (expectably) looks only at DW_TAG_inlined_subroutine
> DIE's immediate children for the formal parameters to print during
> backtrace.
> Fixed up by moving the VAR_DECLs for parameters optimized away by versioning
> to BLOCK_SUPERCONTEXT during inlining, at that point we know both of the
> blocks have the same scope, and if the original DECL_INITIAL doesn't contain
> any other vars, we can actually end up with shorter/more correct debug info
> as well as memory savings due to being able to GC the remapped DECL_INITIAL
> block.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

This looks like the wrong place to fix things to me ... either we can
fix this at the point we create the VAR_DECLs for the optimized away
PARM_DECLs (or we should delay that until here?) or we fix it up
in dwarf2out.c (how does this fix interact with stabs and the other
debuginfo formats?  mentioning DWARF in tree-inline looks odd,
unless we get rid of the other formats - something I'd of course
welcome ;))

Thanks,
Richard.

> 2012-10-03  Jakub Jelinek  
> 
>   * tree-inline.c (expand_call_inline): Move VAR_DECLs with
>   PARM_DECL origins from remapped DECL_INITIAL's BLOCK_VARS
>   into id->block's BLOCK_VARS.
> 
> --- gcc/tree-inline.c.jj  2012-10-02 17:43:13.0 +0200
> +++ gcc/tree-inline.c 2012-10-02 19:43:52.576382413 +0200
> @@ -3946,7 +3946,29 @@ expand_call_inline (basic_block bb, gimp
>initialize_inlined_parameters (id, stmt, fn, bb);
>  
>if (DECL_INITIAL (fn))
> -prepend_lexical_block (id->block, remap_blocks (DECL_INITIAL (fn), id));
> +{
> +  tree *var;
> +
> +  prepend_lexical_block (id->block, remap_blocks (DECL_INITIAL (fn), 
> id));
> +  gcc_checking_assert (BLOCK_SUBBLOCKS (id->block)
> +&& (BLOCK_CHAIN (BLOCK_SUBBLOCKS (id->block))
> +== NULL_TREE));
> +  /* Move vars for PARM_DECLs from DECL_INITIAL block to id->block,
> +  otherwise DW_TAG_formal_parameter will not be children of
> +  DW_TAG_inlined_subroutine, but of a DW_TAG_lexical_block
> +  under it.  The parameters can be then evaluated in the debugger,
> +  but don't show in backtraces.  */
> +  for (var = &BLOCK_VARS (BLOCK_SUBBLOCKS (id->block)); *var; )
> + if (TREE_CODE (DECL_ORIGIN (*var)) == PARM_DECL)
> +   {
> + tree v = *var;
> + *var = TREE_CHAIN (v);
> + TREE_CHAIN (v) = BLOCK_VARS (id->block);
> + BLOCK_VARS (id->block) = v;
> +   }
> + else
> +   var = &TREE_CHAIN (*var);
> +}
>  
>/* Return statements in the function body will be replaced by jumps
>   to the RET_LABEL.  */
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend


Re: RFC: Using DECL_NO_LIMIT_STACK as a backend specific flag

2012-10-04 Thread Richard Guenther
On Wed, Oct 3, 2012 at 5:20 PM, nick clifton  wrote:
> Hi Ian,
>
>
>> Can't you just keep a list of the decls for which you have issued the
>> warning?
>
>
> Yes - that would work too.  In fact I agree that this would be cleaner
> solution in my particular case.  I'll create a new patch...
>
>
>
>> How many decls do you expect to be on that list?
>
>
> Not very many.  Maybe two or three at most.
>
> But I am interested to know if there is a way for targets to add their own
> information to function decls (and decls in general).  If not for this
> particular case, then for problems to come in the future.

To the decl not.  But I suppose you are only looking at defined functions,
so it should suffice to amend struct function.  It's reasonable to allow
the target to associate extra info with struct function and we already
have a way to do that via the init_machine_status target hook.

Richard.

> Cheers
>   Nick
>
>


Re: PATCH trunk: gengtype honoring mark_hook-s inside struct inside union-s

2012-10-04 Thread Basile Starynkevitch
On Wed, Oct 03, 2012 at 01:02:44PM +0200, Basile Starynkevitch wrote:
> On Wed, Oct 03, 2012 at 12:21:02PM +0300, Laurynas Biveinis wrote:
> > Hello Basile -
> > 
> > > 2012-10-02  Basile Starynkevitch  
> > >
> > > * gengtype.c (walk_type): Emit mark_hook when inside a
> > >   struct of a union member.
> > 
> > Can you send me off-list the gengtype output before and after the fix?
> 
> I messed something, the example I did send was wrong. Let's start all over 
> again.
> 
> Consider the following file 
[...]

This is PR54809 on our bugzilla.

Thanks.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: RFA: add lock_length attribute to break branch-shortening cycles

2012-10-04 Thread Richard Guenther
On Wed, Oct 3, 2012 at 8:22 PM, Joern Rennecke
 wrote:
> The ARCompact architecture has some pipelining features that result in
> the vanilla branch shortening not always converging.
>
> Moreover, there are some short, complex branch instructions with very short
> offsets; replacing them with a multi-insn sequence when the offset doesn't
> reach makes the code significantly longer.  Thus, when starting branch
> shortening with pessimistic assumptions, the short branches are often not
> used because of the pessimistic branch length causing the offsets going out
> of range.
> This problem can be avoided when starting with a low estimate and working
> upwards.  However, that makes the incidence of infinite branch shortening
> cycles higher, and also makes it impossible to just break out after some
> iteration count.
>
> To address these issues, I've made the generator programs recognize the
> optional lock_length attribute.
>
> To quote from the documentation added for this feature:
>
>  If you define the `lock_length' attribute, branch shortening will work
> the other way round: it starts out assuming minimum instruction lengths
> and iterates from there.  In addition, the value of the `lock_length'
> attribute does not decrease across iterations, and the value computed
> for the `length' attribute will be no smaller than that of the
> `lock_length' attribute.

I miss a few things in this description:
- what is the value of lock_length supposed to be?  From the "lock"
  prefix it sounds like it is something unchanging, maybe even constant,
  thus a maximum?
- the length attribute still needs to be specified when lock_length is?
  how do they relate?  Is lock_length always smaller / bigger than length?
- what happens if you have patterns with lock_length and patterns without?
- what patterns does lock_length apply to?

In general optimistically attacking this kind of problem should be always
better - did you try simply switching this for all targets?  It shouldn't be
slower and the only thing you need to guarantee is that during iteration
you never make insn-lenghts smaller again.

Richard.

> bootstrapped and regression tested on i686-pc-linux-gnu
>
> 2012-10-03  Joern Rennecke  
>
> * final.c (get_attr_length_1): Use direct recursion rather than
> calling get_attr_length.
> (get_attr_lock_length): New function.
> (INSN_VARIABLE_LENGTH_P): Define.
> (shorten_branches): Take HAVE_ATTR_lock_length into account.
> Don't overwrite non-delay slot insn lengths with the lengths of
> delay slot insns with same uid.
> * genattrtab.c (lock_length_str): New variable.
> (make_length_attrs): New parameter base.
> (main): Initialize lock_length_str.
> Generate lock_lengths attributes.
> * genattr.c (gen_attr): Emit declarations for lock_length attribute
> related functions.
> * doc/md.texi (node Insn Lengths): Document lock_length attribute.
>
> Index: doc/md.texi
> ===
> --- doc/md.texi (revision 192036)
> +++ doc/md.texi (working copy)
> @@ -8004,6 +8004,20 @@ (define_insn "jump"
>(const_int 6)))])
>  @end smallexample
>
> +@cindex lock_length
> +Usually, branch shortening is done assuming the worst case (i.e. longest)
> +lengths, and then iterating (if optimizing) to smaller lengths till
> +no further changed occur.  This does not work so well for architectures
> +that have very small minimum offsets and considerable jumps in instruction
> +lengths.
> +
> +If you define the @code{lock_length} attribute, branch shortening will
> +work the other way round: it starts out assuming minimum instruction
> +lengths and iterates from there.  In addition, the value of the
> +@code{lock_length} attribute does not decrease across iterations, and
> +the value computed for the @code{length} attribute will be no smaller
> +than that of the @code{lock_length} attribute.
> +
>  @end ifset
>  @ifset INTERNALS
>  @node Constant Attributes
> Index: final.c
> ===
> --- final.c (revision 192036)
> +++ final.c (working copy)
> @@ -312,6 +312,7 @@ dbr_sequence_length (void)
> `insn_current_length'.  */
>
>  static int *insn_lengths;
> +static char *uid_lock_length;
>
>  VEC(int,heap) *insn_addresses_;
>
> @@ -447,6 +448,20 @@ get_attr_length (rtx insn)
>return get_attr_length_1 (insn, insn_default_length);
>  }
>
> +#ifdef HAVE_ATTR_lock_length
> +int
> +get_attr_lock_length (rtx insn)
> +{
> +  if (uid_lock_length && insn_lengths_max_uid > INSN_UID (insn))
> +return uid_lock_length[INSN_UID (insn)];
> +  return get_attr_length_1 (insn, insn_min_lock_length);
> +}
> +#define INSN_VARIABLE_LENGTH_P(INSN) \
> +  (insn_variable_length_p (INSN) || insn_variable_lock_length_p (INSN))
> +#else
> +#define INSN_VARIABLE_LENGTH_P(INSN) (insn_variable_length_p (INSN))
> +#e

Re: [PATCH] Fix up DW_TAG_formal_parameter placement

2012-10-04 Thread Jakub Jelinek
On Thu, Oct 04, 2012 at 09:42:59AM +0200, Richard Guenther wrote:
> This looks like the wrong place to fix things to me ... either we can
> fix this at the point we create the VAR_DECLs for the optimized away
> PARM_DECLs (or we should delay that until here?)

No, that is not possible.  There is no other block they could be added
to (they are added to DECL_INITIAL block), and they definitely need to
be added there, they are needed for the more common case where it is not 
inlined.
And in that case it is the right location, for non-inlined function
DECL_INITIAL block's BLOCK_VARS is added directly as children of the
DW_TAG_subprogram.

> or we fix it up
> in dwarf2out.c (how does this fix interact with stabs and the other
> debuginfo formats?

We can't do that either, dwarf2out doesn't have information whether blocks
are really used (as in, any insns/stmts mention that block in
INSN_BLOCK/gimple_block) or not, it is only correct to move the VAR_DECLs
with PARM_DECL DECL_ORIGIN (i.e. DW_TAG_formal_parameter) up if the outer
BLOCK is not referenced by any insn/stmt (i.e. if the ranges of the
inner block with the VAR_DECL and outer block are exactly the same).
If the outer block has range that is superset of the inner block's range,
then the move would invalidly say that the DW_TAG_formal_parameter
is available somewhere where it is not supposed to be available.

Initially I thought I'd do the moves in tree-ssa-live.c, in
remove_unused_scope_block_p it has information about what blocks are used
by any stmts and what are not.  But it would be terribly expensive,
for each VAR_DECL in a block where its BLOCK_SUPERCONTEXT wasn't originally
TREE_USED before remove_unused_scope_block_p (and such blocks up to a
!TREE_USED inlined_function_outer_scope_p), it would need to search
all the BLOCK_SUPERCONTEXT BLOCK_VARS to see if the VAR_DECL isn't present
there as well, and only if not, move to the inlined_function_outer_scope_p
BLOCK.

Doing it in tree-inline.c is IMHO the right spot, it is the place that
creates the extra artificial BLOCK around the remapped DECL_INITIAL block
and puts function arguments there.  At that point we know for sure that
the DECL_INITIAL block has the same ranges as the whole inline function,
and it is desirable to move all arguments to the outer block, not just
those that were still present in DECL_ARGUMENTS during inlining.

If you want to be more specific on what is to be moved, we could either
add some VAR_DECL flag bit (but that is expensive, we don't have many),
or perhaps just check that DECL_CONTEXT (DECL_ORIGIN (v)) == DECL_ORIGIN (fn)
(but is that ever false?).

>  mentioning DWARF in tree-inline looks odd,
> unless we get rid of the other formats - something I'd of course
> welcome ;))

That can be fixed, I can replace the DWARF terminology with something more
fuzzy.

Jakub


Adjust gcc.dg/lower-subreg-1.c for SPARC

2012-10-04 Thread Eric Botcazou
Tested on SPARC/Solaris 10, applied on the mainline.


2012-10-04  Eric Botcazou  

PR rtl-optimization/54739
* gcc.dg/lower-subreg-1.c: Also skip on SPARC.


-- 
Eric BotcazouIndex: gcc.dg/lower-subreg-1.c
===
--- gcc.dg/lower-subreg-1.c	(revision 192031)
+++ gcc.dg/lower-subreg-1.c	(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! { mips64 || { arm*-*-* ia64-*-* spu-*-* tilegx-*-* } } } } } */
+/* { dg-do compile { target { ! { mips64 || { arm*-*-* ia64-*-* sparc*-*-* spu-*-* tilegx-*-* } } } } } */
 /* { dg-options "-O -fdump-rtl-subreg1" } */
 /* { dg-skip-if "" { { i?86-*-* x86_64-*-* } && x32 } { "*" } { "" } } */
 /* { dg-require-effective-target ilp32 } */

Re: opts.c, gcc.c: Plug some memory leaks - and an out-of-bounds memory access

2012-10-04 Thread Richard Guenther
On Wed, Oct 3, 2012 at 11:01 PM, Tobias Burnus  wrote:
> Found using http://scan5.coverity.com/
>
> Build on x86-64-gnu-linux with C/C++/Fortran. I will now do an all-language
> build/regtest.
> OK when it passes?
>
> (Note to the save_string call: I reduced it by 2: The "+1" in the call makes
> it long (out of bounds) and the "+1" in temp_filename_length is not needed
> (but also doesn't harm) as "tmp" is null terminated and save_string adds
> another '\0' after copying "len" bytes.)

- prefix = concat (target_sysroot_suffix, prefix, NULL);
-  prefix = concat (sysroot_no_trailing_dir_separator, prefix, NULL);
+   {
+ char *tmp;
+ tmp = concat (target_sysroot_suffix, prefix, NULL);
+ prefix = concat (sysroot_no_trailing_dir_separator, tmp, NULL);
+ free (tmp);
+   }

prefix = concat (sysroot_no_trailing_dir_separator,
target_sysroot_suffix, prefix, NULL);

should be equivalent and easier to read, no?

+  else
+   prefix = concat (sysroot_no_trailing_dir_separator, prefix, NULL);
+

btw, we're not careing too much about memleaks in the driver ...

Otherwise the patch looks ok with the above change.

Thanks,
Richard.

> Tobias


Fix -fdump-ada-spec

2012-10-04 Thread Arnaud Charlet
After changes by Sharad (Add option for dumping to stderr (issue6190057)),
-fdump-ada-spec is broken, and is now a no-op.

Admittedly, this is because -fdump-ada-spec is handled differently from
other -fdump-* switches, so this patch fixes support for -fdump-ada-spec
by using an approach similar to -fdump-go-spec, and use regular switches
via c/c.opt. I've removed the handling of TDF_RAW, which was a debugging
option, and never really used, so can be simply deleted.

Change is mostly trivial/mechanical.

Tested on x86_64-pc-linux-gnu, OK for trunk?

gcc/

2012-10-04  Arnaud Charlet  

* dumpfile.h, dumpfile.c: Remove TDI_ada.

c-family/

2012-10-04  Arnaud Charlet  

* c-ada-spec.c (print_ada_declaration): Remove handling of TDF_RAW.
* c.opt (-fdump-ada-spec, -fdump-ada-spec-slim): Move switch definition
out of dumpfile.h.

c/

2012-10-04  Arnaud Charlet  

* c-decl.c (c_write_global_declarations): Fix handling of
-fdump-ada-spec*.

cp/

2012-10-04  Arnaud Charlet  

* decl2.c (cp_write_global_declarations): Fix handling of
-fdump-ada-spec*.

Arno
--
Index: c-family/c.opt
===
--- c-family/c.opt  (revision 192062)
+++ c-family/c.opt  (working copy)
@@ -799,6 +799,14 @@ fdollars-in-identifiers
 C ObjC C++ ObjC++
 Permit '$' as an identifier character
 
+fdump-ada-spec
+C ObjC C++ ObjC++ RejectNegative Var(flag_dump_ada_spec)
+Write all declarations as Ada code transitively
+
+fdump-ada-spec-slim
+C ObjC C++ ObjC++ RejectNegative Var(flag_dump_ada_spec_slim)
+Write all declarations as Ada code for the given file only
+
 felide-constructors
 C++ ObjC++ Var(flag_elide_constructors) Init(1)
 
Index: c-family/c-ada-spec.c
===
--- c-family/c-ada-spec.c   (revision 192062)
+++ c-family/c-ada-spec.c   (working copy)
@@ -2535,7 +2535,6 @@ print_ada_declaration (pretty_printer *b
   int is_class = false;
   tree name = TYPE_NAME (TREE_TYPE (t));
   tree decl_name = DECL_NAME (t);
-  bool dump_internal = get_dump_file_info (TDI_ada)->pflags & TDF_RAW;
   tree orig = NULL_TREE;
 
   if (cpp_check && cpp_check (t, IS_TEMPLATE))
@@ -2705,8 +2704,7 @@ print_ada_declaration (pretty_printer *b
 }
   else
 {
-  if (!dump_internal
- && TREE_CODE (t) == VAR_DECL
+  if (TREE_CODE (t) == VAR_DECL
  && decl_name
  && *IDENTIFIER_POINTER (decl_name) == '_')
return 0;
@@ -2796,8 +2794,7 @@ print_ada_declaration (pretty_printer *b
 
   /* If this function has an entry in the dispatch table, we cannot
 omit it.  */
-  if (!dump_internal && !DECL_VINDEX (t)
- && *IDENTIFIER_POINTER (decl_name) == '_')
+  if (!DECL_VINDEX (t) && *IDENTIFIER_POINTER (decl_name) == '_')
{
  if (IDENTIFIER_POINTER (decl_name)[1] == '_')
return 0;
Index: c/c-decl.c
===
--- c/c-decl.c  (revision 192062)
+++ c/c-decl.c  (working copy)
@@ -10079,10 +10079,10 @@ c_write_global_declarations (void)
   gcc_assert (!current_scope);
 
   /* Handle -fdump-ada-spec[-slim]. */
-  if (dump_initialized_p (TDI_ada))
+  if (flag_dump_ada_spec || flag_dump_ada_spec_slim)
 {
   /* Build a table of files to generate specs for */
-  if (get_dump_file_info (TDI_ada)->pflags & TDF_SLIM)
+  if (flag_dump_ada_spec_slim)
collect_source_ref (main_input_filename);
   else
for_each_global_decl (collect_source_ref_cb);
Index: cp/decl2.c
===
--- cp/decl2.c  (revision 192062)
+++ cp/decl2.c  (working copy)
@@ -3698,9 +3698,9 @@ cp_write_global_declarations (void)
   cgraph_process_same_body_aliases ();
 
   /* Handle -fdump-ada-spec[-slim] */
-  if (dump_initialized_p (TDI_ada))
+  if (flag_dump_ada_spec || flag_dump_ada_spec_slim)
 {
-  if (get_dump_file_info (TDI_ada)->pflags & TDF_SLIM)
+  if (flag_dump_ada_spec_slim)
collect_source_ref (main_input_filename);
   else
collect_source_refs (global_namespace);
Index: dumpfile.c
===
--- dumpfile.c  (revision 192062)
+++ dumpfile.c  (working copy)
@@ -57,8 +57,7 @@ static struct dump_file_info dump_files[
0, 0, 0, 5},
   {".vcg", "tree-vcg", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
0, 0, 0, 6},
-  {".ads", "ada-spec", NULL, NULL, NULL, NULL, NULL, 0, 0, 0, 0, 7},
-#define FIRST_AUTO_NUMBERED_DUMP 8
+#define FIRST_AUTO_NUMBERED_DUMP 7
 
   {NULL, "tree-all", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
0, 0, 0, 0},
Index: dumpfile.h
===
--- dumpfile.h  (revision 192062)
+++ dumpfile.h  (working copy)
@@ -37,7 +37,6 @@ enum tree_dump_index
   TDI_nested,  /* dump each function after unnest

Re: [PATCH] Fix instability of -fschedule-insn for x86

2012-10-04 Thread Kirill Yukhin
Hi,

> Based on this opinion, the patch is OK for mainline, if there are no
...

Checked in: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00187.html

K


Re: [Ping]FW: [PATCH] Cilk Plus merging to trunk (2 of n)

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 2:22 AM, Iyer, Balaji V  wrote:
> Hi Joseph,
> Did you get a chance to look at this submission? I think I have fixed 
> all the changes you have mentioned. Is it OK for trunk?
>
> Thanks,
>
> Balaji V. Iyer.
>
>>-Original Message-
>>From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>>ow...@gcc.gnu.org] On Behalf Of Iyer, Balaji V
>>Sent: Wednesday, September 26, 2012 7:16 PM
>>To: Joseph Myers
>>Cc: gcc-patches@gcc.gnu.org; al...@redhat.com; r...@redhat.com;
>>l...@redhat.com
>>Subject: RE: [PATCH] Cilk Plus merging to trunk (2 of n)
>>
>>Hello Joseph,
>>   In my last patch, I forgot to add the change Richard Guenther wanted me
>>to make. He wanted me to move the ARRAY_NOTATION_REF node from tree.def
>>to c-family/c-common.def.  Here is a new one that has this change. I am sorry 
>>for
>>this.
>>
>>Here are ChangeLog entries:
>>
>>gcc/ChangeLog
>>2012-09-26  Balaji V. Iyer  
>>
>>* tree.h (array_notation_reduce_type): Added new enumerator.

This should be moved to c-tree.h then, and ...

>>* Makefile.in (OBJS): Added array-notation-common.o.
>>* doc/passes.texi (Cilk Plus Transformation): Documented array
>>notation and overall transformations for Cilk Plus.
>>* doc/invoke.texi (C Dialect Options): Documented -fcilkplus flag.
>>* doc/generic.texi (Storage References): Documented
>>ARRAY_NOTATION_REF
>>tree addition.
>>* tree-pretty-pretty.c (dump_generic_node): Added ARRAY_NOTATION_REF
>>case.

... this to c-pretty-print.c and

>>* array-notation-common.c: New file.

... this to the c-common/ directory.

Basically this should be a completely frontend-only patch.

Richard.

>>gcc/c-family/ChangeLog
>>2012-09-26  Balaji V. Iyer  
>>
>>* c-common.h (build_array_notation_expr): New function declaration.
>>(ARRAY_NOTATION_ARRAY): Added new #define.
>>(ARRAY_NOTATION_CHECK): Likewise.
>>(ARRAY_NOTATION_START): Likewise.
>>(ARRAY_NOTATION_LENGTH): Likewise.
>>(ARRAY_NOTATION_STRIDE): Likewise.
>>(ARRAY_NOTATION_TYPE): Likewise.
>>* c-common.def: Added new tree ARRAY_NOTATION_REF.
>>* c-common.c (c_define_builtins): Added a call to initialize array
>>notation builtin functions.
>>(c_common_init_ts): Set ARRAY_NOTATION_REF as typed.
>>* c.opt (-fcilkplus): Define new command line switch.
>>
>>gcc/c/ChangeLog
>>2012-09-26  Balaji V. Iyer  
>>
>>* c-typeck.c (convert_arguments): Added a check if tree contains
>>array notation expressions before throwing errors or doing anything.
>>* Make-lang.in (C_AND_OBJC_OBJS): Added c-array-notation.o.
>>* c-parser.c (c_parser_compound_statement): Check if array notation 
>> code
>>is used in tree, if so, then transform them into appropriate C code.
>>(c_parser_expr_no_commas): Check if array notation is used in LHS or
>>RHS, if so, then build array notation expression instead of regular
>>modify.
>>(c_parser_postfix_expression_after_primary): Added a check for 
>> colon(s)
>>after square braces, if so then handle it like an array notation.  
>> Also,
>>break up array notations in unary op if found.
>>(c_parser_array_notation): New function.
>>* c-array-notation.c: New file.
>>
>>gcc/testsuite/ChangeLog
>>2012-09-26  Balaji V. Iyer  
>>
>>* gcc.dg/cilk-plus/array_notation/execute/execute.exp: New script.
>>* gcc.dg/cilk-plus/array_notation/compile/compile.exp: Likewise.
>>* gcc.dg/cilk-plus/array_notation/errors/errors.exp: Likewise.
>>* gcc.dg/cilk-plus/array_notation/execute/sec_implicit_ex.c: New test.
>>* gcc.dg/cilk-plus/array_notation/execute/if_test.c: Likewise.
>>* gcc.dg/cilk-plus/array_notation/execute/gather_scatter.c: Likewise.
>>* gcc.dg/cilk-plus/array_notation/execute/builtin_func_double2.c:
>>Likewise.
>>* gcc.dg/cilk-plus/array_notation/execute/builtin_func_double.c:
>>Likewise.
>>* gcc.dg/cilk-plus/array_notation/execute/builtin_fn_custom.c: 
>> Likewise.
>>* gcc.dg/cilk-plus/array_notation/execute/builtin_fn_mutating.c:
>>Likewise.
>>* gcc.dg/cilk-plus/array_notation/execute/array_test_ND.c: Likewise.
>>* gcc.dg/cilk-plus/array_notation/execute/array_test2.c: Likewise.
>>* gcc.dg/cilk-plus/array_notation/execute/array_test1.c: Likewise.
>>* gcc.dg/cilk-plus/array_notation/compile/sec_implicit_ex.c: Likewise.
>>* gcc.dg/cilk-plus/array_notation/compile/gather_scatter.c: Likewise.
>>* gcc.dg/cilk-plus/array_notation/compile/builtin_func_double2.c:
>>Likewise.
>>* gcc.dg/cilk-plus/array_notation/compile/array_test_ND.c: Likewise.
>>* gcc.dg/cilk-plus/array_notation/compile/if_test.c: Likewise.
>>* gcc.dg/cilk-plus/array_notation/compile/builtin_func_double.c:

Re: Fix -fdump-ada-spec

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 10:26 AM, Arnaud Charlet  wrote:
> After changes by Sharad (Add option for dumping to stderr (issue6190057)),
> -fdump-ada-spec is broken, and is now a no-op.
>
> Admittedly, this is because -fdump-ada-spec is handled differently from
> other -fdump-* switches, so this patch fixes support for -fdump-ada-spec
> by using an approach similar to -fdump-go-spec, and use regular switches
> via c/c.opt. I've removed the handling of TDF_RAW, which was a debugging
> option, and never really used, so can be simply deleted.
>
> Change is mostly trivial/mechanical.
>
> Tested on x86_64-pc-linux-gnu, OK for trunk?

Much cleaner indeed.

Ok,
Thanks,
Richard.

> gcc/
>
> 2012-10-04  Arnaud Charlet  
>
> * dumpfile.h, dumpfile.c: Remove TDI_ada.
>
> c-family/
>
> 2012-10-04  Arnaud Charlet  
>
> * c-ada-spec.c (print_ada_declaration): Remove handling of TDF_RAW.
> * c.opt (-fdump-ada-spec, -fdump-ada-spec-slim): Move switch 
> definition
> out of dumpfile.h.
>
> c/
>
> 2012-10-04  Arnaud Charlet  
>
> * c-decl.c (c_write_global_declarations): Fix handling of
> -fdump-ada-spec*.
>
> cp/
>
> 2012-10-04  Arnaud Charlet  
>
> * decl2.c (cp_write_global_declarations): Fix handling of
> -fdump-ada-spec*.
>
> Arno
> --
> Index: c-family/c.opt
> ===
> --- c-family/c.opt  (revision 192062)
> +++ c-family/c.opt  (working copy)
> @@ -799,6 +799,14 @@ fdollars-in-identifiers
>  C ObjC C++ ObjC++
>  Permit '$' as an identifier character
>
> +fdump-ada-spec
> +C ObjC C++ ObjC++ RejectNegative Var(flag_dump_ada_spec)
> +Write all declarations as Ada code transitively
> +
> +fdump-ada-spec-slim
> +C ObjC C++ ObjC++ RejectNegative Var(flag_dump_ada_spec_slim)
> +Write all declarations as Ada code for the given file only
> +
>  felide-constructors
>  C++ ObjC++ Var(flag_elide_constructors) Init(1)
>
> Index: c-family/c-ada-spec.c
> ===
> --- c-family/c-ada-spec.c   (revision 192062)
> +++ c-family/c-ada-spec.c   (working copy)
> @@ -2535,7 +2535,6 @@ print_ada_declaration (pretty_printer *b
>int is_class = false;
>tree name = TYPE_NAME (TREE_TYPE (t));
>tree decl_name = DECL_NAME (t);
> -  bool dump_internal = get_dump_file_info (TDI_ada)->pflags & TDF_RAW;
>tree orig = NULL_TREE;
>
>if (cpp_check && cpp_check (t, IS_TEMPLATE))
> @@ -2705,8 +2704,7 @@ print_ada_declaration (pretty_printer *b
>  }
>else
>  {
> -  if (!dump_internal
> - && TREE_CODE (t) == VAR_DECL
> +  if (TREE_CODE (t) == VAR_DECL
>   && decl_name
>   && *IDENTIFIER_POINTER (decl_name) == '_')
> return 0;
> @@ -2796,8 +2794,7 @@ print_ada_declaration (pretty_printer *b
>
>/* If this function has an entry in the dispatch table, we cannot
>  omit it.  */
> -  if (!dump_internal && !DECL_VINDEX (t)
> - && *IDENTIFIER_POINTER (decl_name) == '_')
> +  if (!DECL_VINDEX (t) && *IDENTIFIER_POINTER (decl_name) == '_')
> {
>   if (IDENTIFIER_POINTER (decl_name)[1] == '_')
> return 0;
> Index: c/c-decl.c
> ===
> --- c/c-decl.c  (revision 192062)
> +++ c/c-decl.c  (working copy)
> @@ -10079,10 +10079,10 @@ c_write_global_declarations (void)
>gcc_assert (!current_scope);
>
>/* Handle -fdump-ada-spec[-slim]. */
> -  if (dump_initialized_p (TDI_ada))
> +  if (flag_dump_ada_spec || flag_dump_ada_spec_slim)
>  {
>/* Build a table of files to generate specs for */
> -  if (get_dump_file_info (TDI_ada)->pflags & TDF_SLIM)
> +  if (flag_dump_ada_spec_slim)
> collect_source_ref (main_input_filename);
>else
> for_each_global_decl (collect_source_ref_cb);
> Index: cp/decl2.c
> ===
> --- cp/decl2.c  (revision 192062)
> +++ cp/decl2.c  (working copy)
> @@ -3698,9 +3698,9 @@ cp_write_global_declarations (void)
>cgraph_process_same_body_aliases ();
>
>/* Handle -fdump-ada-spec[-slim] */
> -  if (dump_initialized_p (TDI_ada))
> +  if (flag_dump_ada_spec || flag_dump_ada_spec_slim)
>  {
> -  if (get_dump_file_info (TDI_ada)->pflags & TDF_SLIM)
> +  if (flag_dump_ada_spec_slim)
> collect_source_ref (main_input_filename);
>else
> collect_source_refs (global_namespace);
> Index: dumpfile.c
> ===
> --- dumpfile.c  (revision 192062)
> +++ dumpfile.c  (working copy)
> @@ -57,8 +57,7 @@ static struct dump_file_info dump_files[
> 0, 0, 0, 5},
>{".vcg", "tree-vcg", NULL, NULL, NULL, NULL, NULL, TDF_TREE,
> 0, 0, 0, 6},
> -  {".ads", "ada-spec", NULL, NULL, NULL, NULL, NULL, 0, 0, 0, 0, 7},
> -#define FIRST_AUTO_NUMBERED_DUMP 8
> +#define FIRST_AUTO_NUMBERED_DU

[Ada] Project_Dir should not modify the case of its path name

2012-10-04 Thread Arnaud Charlet
This patch ensures that when project'Project_Dir is used, the path will
not be modified, even on platforms where the file names are case
insensitive, such as Windows.

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-10-04  Vincent Celier  

* prj-proc.adb (Process_Package_Declaration): Use project
directory display path name as the value of 'Project_Dir.

Index: prj-proc.adb
===
--- prj-proc.adb(revision 192066)
+++ prj-proc.adb(working copy)
@@ -1588,7 +1588,7 @@
   Add_Attributes
 (Project,
  Project.Name,
- Name_Id (Project.Directory.Name),
+ Name_Id (Project.Directory.Display_Name),
  Shared,
  Shared.Packages.Table (New_Pkg).Decl,
  First_Attribute_Of


[Ada] Visibility error in the presence of private limited with clauses

2012-10-04 Thread Arnaud Charlet
This patch fixes a visibility error when compiling a unit DDP, when an
ancestor P of DDP has a private limited with clause on a descendant of P that
is itself an ancestor of DDP.

The following must compile quietly:
   
  gcc -c -gnat05 bg-el-lc.adb

---
package body BG.El.LC is
   overriding procedure Bind (E : access One_Port; K : in AKind) is
   begin
  null;
   end Bind;
end BG.El.LC;
---
package BG.El.LC is
   type Component is abstract new Element with null record;

private

   type One_Port is new Component with null record;

   overriding procedure Bind (E : access One_Port; K : in AKind);

end BG.El.LC;
---
with Ada.Strings.Bounded;
private
package BG.El is

   type Element is abstract tagged private;

   type AKind is (A, B);

   procedure Bind (E : access Element; K : in AKind) is abstract;

private

   type Element is abstract tagged null record;
end BG.El;
---
limited private with BG.El;
package BG is

   type Object is abstract tagged limited private;

   procedure Bind (Graph : in out Object) is abstract;

private
   type Object is abstract tagged limited null record;
end BG;

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-10-04  Ed Schonberg  

* sem_ch10.adb (Is_Ancestor_Unit): Make global, for use elsewhere.
(Install_Private_with_Clauses): if clause is private and limited,
do not install the limited view if the library unit is an ancestor
of the unit being compiled.  This unusual configuration occurs
when compiling a unit DDP, when an ancestor P of DDP has a
private limited with clause on a descendant of P that is itself
an ancestor of DDP.

Index: sem_ch10.adb
===
--- sem_ch10.adb(revision 192066)
+++ sem_ch10.adb(working copy)
@@ -164,6 +164,11 @@
--  an enclosing scope. Iterate over context to find child units of U_Name
--  or of some ancestor of it.
 
+   function Is_Ancestor_Unit (U1 : Node_Id; U2 : Node_Id) return Boolean;
+   --  When compiling a unit Q descended from some parent unit P, a limited
+   --  with_clause in the context of P that names some other ancestor of Q
+   --  must not be installed because the ancestor is immediately visible.
+
function Is_Child_Spec (Lib_Unit : Node_Id) return Boolean;
--  Lib_Unit is a library unit which may be a spec or a body. Is_Child_Spec
--  returns True if Lib_Unit is a library spec which is a child spec, i.e.
@@ -3521,11 +3526,6 @@
   --  units. The shadow entities are created when the inserted clause is
   --  analyzed. Implements Ada 2005 (AI-50217).
 
-  function Is_Ancestor_Unit (U1 : Node_Id; U2 : Node_Id) return Boolean;
-  --  When compiling a unit Q descended from some parent unit P, a limited
-  --  with_clause in the context of P that names some other ancestor of Q
-  --  must not be installed because the ancestor is immediately visible.
-
   -
   -- Check_Renamings --
   -
@@ -3794,22 +3794,6 @@
  end if;
   end Expand_Limited_With_Clause;
 
-  --
-  -- Is_Ancestor_Unit --
-  --
-
-  function Is_Ancestor_Unit (U1 : Node_Id; U2 : Node_Id) return Boolean is
- E1 : constant Entity_Id := Defining_Entity (Unit (U1));
- E2 : Entity_Id;
-  begin
- if Nkind_In (Unit (U2), N_Package_Body, N_Subprogram_Body) then
-E2 := Defining_Entity (Unit (Library_Unit (U2)));
-return Is_Ancestor_Package (E1, E2);
- else
-return False;
- end if;
-  end Is_Ancestor_Unit;
-
--  Start of processing for Install_Limited_Context_Clauses
 
begin
@@ -4061,8 +4045,17 @@
 if Nkind (Item) = N_With_Clause
   and then Private_Present (Item)
 then
+   --  If the unit is an ancestor of the current one, it is the
+   --  case of a private limited with clause on a child unit, and
+   --  the compilation of one of its descendants, In that case the
+   --  limited view is errelevant.
+
if Limited_Present (Item) then
-  if not Limited_View_Installed (Item) then
+  if not Limited_View_Installed (Item)
+and then
+  not Is_Ancestor_Unit (Library_Unit (Item),
+Cunit (Current_Sem_Unit))
+  then
  Install_Limited_Withed_Unit (Item);
   end if;
else
@@ -5269,6 +5262,22 @@
 (C_Unit, Cunit_Entity (Get_Source_Unit (Non_Limited_View (T;
end Is_Legal_Shadow_Entity_In_Body;
 
+   --
+   -- Is_Ancestor_Unit --
+   --
+
+   function Is_Ancestor_Unit (U1 : Node_Id; U2 : Node_Id) return Boolean is
+  E1 : constant Entity_Id := Defining_Ent

[Ada] Remaining fixes to get MINIMIZED through test suite

2012-10-04 Thread Arnaud Charlet
This patch makes two minor corrections to fix two remaining tests in
the test suite that failed if run with -gnato2 forced on. There is
also a minor optimization of Compile_Time_Compare which improves
the results in some cases (noticed during testing, but does not
have any effect on the test suite).

The first problem was in handling folding of /= in some cases
which showed up as a bogus complaint about a length check failing.
The following should compile cleanly with -gnato2 with no messages.

 1. procedure Compov2 is
 2. begin
 3.for J in 1 .. 1 loop
 4.   declare
 5.  Dest   : constant Wide_String (J .. J + 6) := "is_copy";
 6.   begin
 7.  null;
 8.   end;
 9.end loop;
10. end;

The other case was a mistake in handling of case expressions, which
shows up as an unexpected constraint error. No simple test case is
available for this problem.

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-10-04  Robert Dewar  

* checks.adb (Minimize_Eliminate_Overflow_Checks): Dont reanalyze
if/case expression if nothing has changed (just reexpand). Stops
case expression from generating incorrect temporary.
* exp_ch4.adb (Expand_Compare_Minimize_Eliminate_Overflow):
Fix cut and paste typo for range analysis in NE (not equal) case.
* sem_eval.adb (Compile_Time_Compare): Small optimization to
catch some more cases.
* types.ads (Suppressed_Or_Checked): New subtype of
Overflow_Check_Type.

Index: types.ads
===
--- types.ads   (revision 192066)
+++ types.ads   (working copy)
@@ -737,7 +737,9 @@
 
subtype Minimized_Or_Eliminated is
  Overflow_Check_Type range Minimized .. Eliminated;
-   --  Definte subtypes so that clients don't need to know ordering. Note that
+   subtype Suppressed_Or_Checked is
+ Overflow_Check_Type range Suppressed .. Checked;
+   --  Define subtypes so that clients don't need to know ordering. Note that
--  Overflow_Check_Type is not marked as an ordered enumeration type.
 
--  The following structure captures the state of check suppression or
Index: checks.adb
===
--- checks.adb  (revision 192066)
+++ checks.adb  (working copy)
@@ -34,6 +34,7 @@
 with Exp_Tss;  use Exp_Tss;
 with Exp_Util; use Exp_Util;
 with Elists;   use Elists;
+with Expander; use Expander;
 with Eval_Fat; use Eval_Fat;
 with Freeze;   use Freeze;
 with Lib;  use Lib;
@@ -1272,8 +1273,7 @@
 Apply_Range_Check (N, Typ);
  end if;
 
-  elsif (Is_Record_Type (Typ)
-   or else Is_Private_Type (Typ))
+  elsif (Is_Record_Type (Typ) or else Is_Private_Type (Typ))
 and then Has_Discriminants (Base_Type (Typ))
 and then Is_Constrained (Typ)
   then
@@ -6709,10 +6709,12 @@
--  to be done in bignum mode), and the determined ranges of the operands.
 
--  After possible rewriting of a constituent subexpression node, a call is
-   --  made to reanalyze the node after setting Analyzed to False. To avoid a
-   --  recursive call into the whole overflow apparatus, and important rule for
-   --  this reanalysis call is that either Do_Overflow_Check must be False, or
-   --  if it is set, then the overflow checking mode must be temporarily set
+   --  made to either reexpand the node (if nothing has changed) or reanalyze
+   --  the node (if it has been modified by the overflow check processing).
+   --  The Analyzed_flag is set False before the reexpand/reanalyze. To avoid
+   --  a recursive call into the whole overflow apparatus, and important rule
+   --  for this call is that either Do_Overflow_Check must be False, or if
+   --  it is set, then the overflow checking mode must be temporarily set
--  to Checked/Suppressed. Either step will avoid the unwanted recursion.
 
procedure Minimize_Eliminate_Overflow_Checks
@@ -6761,6 +6763,17 @@
   --  range, then we must convert such operands back to the result type.
   --  This switch is properly set only when Bignum_Operands is False.
 
+  procedure Reexpand (C : Suppressed_Or_Checked);
+  --  This is called when we have not modifed the node, so we do not need
+  --  to reanalyze it. But we do want to reexpand it in either CHECKED
+  --  or SUPPRESSED mode (as indicated by the argument C) to get proper
+  --  expansion. It is important that we reset the mode to SUPPRESSED or
+  --  CHECKED, since if we leave it in MINIMIZED or ELIMINATED mode we
+  --  would reenter this routine recursively which would not be good!
+  --  Note that this is not just an optimization, testing has showed up
+  --  several complex cases in which renalyzing an already analyzed node
+  --  causes incorrect behavior.
+
   function In_Result_Range return Boolean;
   --  Returns True iff Lo .. Hi ar

[Ada] Remaining fixes for -gnato3 (eliminated mode overflow checks)

2012-10-04 Thread Arnaud Charlet
This patch corrects a couple of errors in the handling of ELIMINATED mode
overflow checking. With this patch, the entire test suite passes with
-gnato3 mode forced on (there are some differences in output, but all
are expected).

The following three tests now work correctly

The following compiles quietly with -gnato3 and outputs TRUE

 1. with Text_IO; use Text_IO;
 2. procedure InGNATo3 is
 3.function K (X, Z : Integer) return Boolean is
 4.begin
 5.   return X in 1 .. Z ** 10;
 6.end;
 7. begin
 8.Put_Line (K (1, Integer'Last)'Img);
 9. end;

The following test compiles with the messages shown, regardless
of -gnato mode, and in particular this is now the -gnato3 output.

 1. procedure whynonso3 (a : integer) is
 2.x : constant := 1 + a ** 10;
   1   2
>>> non-static expression used in number declaration
>>> "a" is not static constant or named number (RM 4.9(5))

 3. begin
 4.null;
 5. end;

The following test compiles quietly in -gnato3 mode

 1. procedure BadAleno3 is
 2.type Arr is array (Positive range <>) of Integer;
 3.N : Integer := 0;
 4.
 5.type ARR_DEF (D3 : INTEGER) is   record
 6.   C1 : Arr (N .. D3);
 7.end record;
 8.
 9.A : Arr := (1, 2);
10.X : constant ARR_DEF := (1, A );
11. begin
12.null;
13. end BadAleno3;

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-10-04  Robert Dewar  

* exp_ch4.adb (Expand_Compare_Minimize_Eliminate_Overflow):
Deal with case where we get a bignum operand and cannot do a
range analysis.
* sem_eval.adb (Why_Not_Static): Deal with bignum operands

Index: exp_ch4.adb
===
--- exp_ch4.adb (revision 192070)
+++ exp_ch4.adb (working copy)
@@ -2325,9 +2325,12 @@
   Minimize_Eliminate_Overflow_Checks
 (Right_Opnd (N), Rlo, Rhi, Top_Level => False);
 
-  --  See if the range information decides the result of the comparison
+  --  See if the range information decides the result of the comparison.
+  --  We can only do this if we in fact have full range information (which
+  --  won't be the case if either operand is bignum at this stage).
 
-  case N_Op_Compare (Nkind (N)) is
+  if Llo /= No_Uint and then Rlo /= No_Uint then
+ case N_Op_Compare (Nkind (N)) is
  when N_Op_Eq =>
 if Llo = Lhi and then Rlo = Rhi and then Llo = Rlo then
Set_True;
@@ -2369,12 +2372,13 @@
 elsif Llo > Rhi or else Lhi < Rlo then
Set_True;
 end if;
-  end case;
+ end case;
 
-  --  All done if we did the rewrite
+ --  All done if we did the rewrite
 
-  if Nkind (N) not in N_Op_Compare then
- return;
+ if Nkind (N) not in N_Op_Compare then
+return;
+ end if;
   end if;
 
   --  Otherwise, time to do the comparison
Index: sem_eval.adb
===
--- sem_eval.adb(revision 192070)
+++ sem_eval.adb(working copy)
@@ -37,6 +37,7 @@
 with Nmake;use Nmake;
 with Nlists;   use Nlists;
 with Opt;  use Opt;
+with Rtsfind;  use Rtsfind;
 with Sem;  use Sem;
 with Sem_Aux;  use Sem_Aux;
 with Sem_Cat;  use Sem_Cat;
@@ -5419,10 +5420,12 @@
 return;
  end if;
 
- --  Type must be scalar or string type
+ --  Type must be scalar or string type (but allow Bignum, since this
+ --  is really a scalar type from our point of view in this diagnosis).
 
  if not Is_Scalar_Type (Typ)
and then not Is_String_Type (Typ)
+   and then not Is_RTE (Typ, RE_Bignum)
  then
 Error_Msg_N
   ("static expression must have scalar or string type " &
@@ -5539,8 +5542,15 @@
 
  when N_Function_Call =>
 Why_Not_Static_List (Parameter_Associations (N));
-Error_Msg_N ("non-static function call (RM 4.9(6,18))!", N);
 
+--  Complain about non-static function call unless we have Bignum
+--  which means that the underlying expression is really some
+--  scalar arithmetic operation.
+
+if not Is_RTE (Typ, RE_Bignum) then
+   Error_Msg_N ("non-static function call (RM 4.9(6,18))!", N);
+end if;
+
  when N_Parameter_Association =>
 Why_Not_Static (Explicit_Actual_Parameter (N));
 


Re: [SH] PR 33135 - Remove mieee option in libgcc

2012-10-04 Thread Kaz Kojima
Oleg Endo  wrote:
> Since the -mieee behavior has been fixed, is enabled by default on SH
> and the additional flags in libgcc can be removed.
> 
> OK?

OK.

Regards,
kaz


[Ada] Legality of aspects specified on a full view

2012-10-04 Thread Arnaud Charlet
In Ada 2012, certain aaspects, such as Type_Invariant, can be specified on a
partial view of a type, or on the full view, but not in both This patch
rejects such duplications cleanly.

the command:

gcc -c -gnat12 -gnata r.ads

must yield:

r.ads:5:32: aspect already specified in private declaration

---
package R is
type T is private with Type_Invariant => Non_Null (T);
function Non_Null (X : T) return Boolean;
private
type T is new Integer with Type_Invariant => T /= 0;
function Non_Null (X : T) return Boolean is (X /= 0);
end R;

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-10-04  Ed Schonberg  

* sem_ch3.adb (Check_Duplicate_Aspects): Diagnose properly
aspects that appear in the partial and the full view of a type.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 192066)
+++ sem_ch3.adb (working copy)
@@ -23,6 +23,7 @@
 --  --
 --
 
+with Aspects;  use Aspects;
 with Atree;use Atree;
 with Checks;   use Checks;
 with Debug;use Debug;
@@ -14805,6 +14806,11 @@
   New_Id   : Entity_Id;
   Prev_Par : Node_Id;
 
+  procedure Check_Duplicate_Aspects;
+  --  Check that aspects specified in a completion have not been specified
+  --  already in the partial view. Type_Invariant and others can be
+  --  specified on either view but never on both.
+
   procedure Tag_Mismatch;
   --  Diagnose a tagged partial view whose full view is untagged.
   --  We post the message on the full view, with a reference to
@@ -14813,6 +14819,38 @@
   --  so we determine the position of the error message from the
   --  respective slocs of both.
 
+  -
+  -- Check_Duplicate_Aspects --
+  -
+  procedure Check_Duplicate_Aspects is
+ Prev_Aspects   : constant List_Id := Aspect_Specifications (Prev_Par);
+ Full_Aspects   : constant List_Id := Aspect_Specifications (N);
+ F_Spec, P_Spec : Node_Id;
+
+  begin
+ if Present (Prev_Aspects) and then Present (Full_Aspects) then
+F_Spec := First (Full_Aspects);
+while Present (F_Spec) loop
+   P_Spec := First (Prev_Aspects);
+   while Present (P_Spec) loop
+  if
+Chars (Identifier (P_Spec)) = Chars (Identifier (F_Spec))
+  then
+ Error_Msg_N
+   ("aspect already specified in private declaration",
+ F_Spec);
+ Remove (F_Spec);
+ return;
+  end if;
+
+  Next (P_Spec);
+   end loop;
+
+   Next (F_Spec);
+end loop;
+ end if;
+  end Check_Duplicate_Aspects;
+
   --
   -- Tag_Mismatch --
   --
@@ -15022,6 +15060,10 @@
  ("declaration of full view must appear in private part", N);
 end if;
 
+if Ada_Version >= Ada_2012 then
+   Check_Duplicate_Aspects;
+end if;
+
 Copy_And_Swap (Prev, Id);
 Set_Has_Private_Declaration (Prev);
 Set_Has_Private_Declaration (Id);


[Ada]: Remove __gl_zero_cost_exceptions in the binder generated file

2012-10-04 Thread Arnaud Charlet
This variable wasn't used anymore.
No functional change. 

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-10-04  Tristan Gingold  

* init.c (__gl_zero_cost_exceptions): Comment it as not used
anymore.
* bindgen.adb (Gen_Adainit): Do not emit Zero_Cost_Exceptions
anymore.

Index: bindgen.adb
===
--- bindgen.adb (revision 192066)
+++ bindgen.adb (working copy)
@@ -137,7 +137,6 @@
-- Num_Interrupt_States  : Integer;
-- Unreserve_All_Interrupts  : Integer;
-- Exception_Tracebacks  : Integer;
-   -- Zero_Cost_Exceptions  : Integer;
-- Detect_Blocking   : Integer;
-- Default_Stack_Size: Integer;
-- Leap_Seconds_Support  : Integer;
@@ -216,9 +215,6 @@
--  tracebacks are provided by default, so a value of zero for this
--  parameter does not necessarily mean no trace backs are available.
 
-   --  Zero_Cost_Exceptions is set to one if zero cost exceptions are used for
-   --  this partition, and to zero if longjmp/setjmp exceptions are used.
-
--  Detect_Blocking indicates whether pragma Detect_Blocking is active or
--  not. A value of zero indicates that the pragma is not present, while a
--  value of 1 signals its presence in the partition.
@@ -607,9 +603,6 @@
  """__gl_exception_tracebacks"");");
  end if;
 
- WBI ("  Zero_Cost_Exceptions : Integer;");
- WBI ("  pragma Import (C, Zero_Cost_Exceptions, " &
-  """__gl_zero_cost_exceptions"");");
  WBI ("  Detect_Blocking : Integer;");
  WBI ("  pragma Import (C, Detect_Blocking, " &
   """__gl_detect_blocking"");");
@@ -803,17 +796,6 @@
 WBI ("  Exception_Tracebacks := 1;");
  end if;
 
- Set_String ("  Zero_Cost_Exceptions := ");
-
- if Zero_Cost_Exceptions_Specified then
-Set_String ("1");
- else
-Set_String ("0");
- end if;
-
- Set_String (";");
- Write_Statement_Buffer;
-
  Set_String ("  Detect_Blocking := ");
 
  if Detect_Blocking then
Index: init.c
===
--- init.c  (revision 192066)
+++ init.c  (working copy)
@@ -103,12 +103,14 @@
 int   __gl_num_interrupt_states  = 0;
 int   __gl_unreserve_all_interrupts  = 0;
 int   __gl_exception_tracebacks  = 0;
-int   __gl_zero_cost_exceptions  = 0;
 int   __gl_detect_blocking   = 0;
 int   __gl_default_stack_size= -1;
 int   __gl_leap_seconds_support  = 0;
 int   __gl_canonical_streams = 0;
 
+/* This value is not used anymore, but kept for bootstrapping purpose.  */
+int   __gl_zero_cost_exceptions  = 0;
+
 /* Indication of whether synchronous signal handler has already been
installed by a previous call to adainit.  */
 int  __gnat_handler_installed  = 0;


[Ada] New preprocessor switch -a

2012-10-04 Thread Arnaud Charlet
This change introduces a new switch "-a" (all source text preserved) for
gnatprep and the integrated preprocessor, causing all source text to be
preserved (i.e. the "if", all "elsif" and the "else" branch of a
#if construct all show up in the output). This is useful to perform
simple style checks on all branches.

The following command must produce the shown output:
$ gnatprep -a -c all_source_.adb.in all_source_text.adb
$ cat all_source_text.adb
procedure All_Source_Text is
beGin
   Some_Code;

--! #if Some_Condition then
   Do_Something;
--! #else
   Do_Something_else;
--! #end if;

   Some_More_Code;
end All_Source_Text;

Input:
procedure All_Source_Text is
beGin
   Some_Code;

#if Some_Condition then
   Do_Something;
#else
   Do_Something_else;
#end if;

   Some_More_Code;
end All_Source_Text;

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-10-04  Thomas Quinot  

* prep.adb, prepcomp.adb, gprep.adb, opt.ads: New preprocessor switch
-a (all source text preserved).

Index: prep.adb
===
--- prep.adb(revision 192066)
+++ prep.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 2002-2010, Free Software Foundation, Inc. --
+--  Copyright (C) 2002-2012, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -292,8 +292,8 @@
  Result.Value := End_String;
   end if;
 
-  --  Now, check the syntax of the symbol (we don't allow accented and
-  --  wide characters)
+  --  Now, check the syntax of the symbol (we don't allow accented or
+  --  wide characters).
 
   if Name_Buffer (1) not in 'a' .. 'z'
 and then Name_Buffer (1) not in 'A' .. 'Z'
@@ -356,7 +356,7 @@
begin
   --  Always return False when not inside an #if statement
 
-  if Pp_States.Last = Ground then
+  if Opt.No_Deletion or else Pp_States.Last = Ground then
  return False;
   else
  return Pp_States.Table (Pp_States.Last).Deleting;
Index: prepcomp.adb
===
--- prepcomp.adb(revision 192066)
+++ prepcomp.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 2003-2010, Free Software Foundation, Inc. --
+--  Copyright (C) 2003-2012, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -60,6 +60,7 @@
   Undef_False  : Boolean:= False;
   Always_Blank : Boolean:= False;
   Comments : Boolean:= False;
+  No_Deletion  : Boolean:= False;
   List_Symbols : Boolean:= False;
   Processed: Boolean:= False;
end record;
@@ -73,6 +74,7 @@
   Undef_False  => False,
   Always_Blank => False,
   Comments => False,
+  No_Deletion  => False,
   List_Symbols => False,
   Processed=> False);
 
@@ -330,6 +332,16 @@
--  significant.
 
case Sinput.Source (Token_Ptr) is
+  when 'a' =>
+
+ --  All source text preserved (also implies -u)
+
+ if Name_Len = 1 then
+Current_Data.No_Deletion := True;
+Current_Data.Undef_False := True;
+OK := True;
+ end if;
+
   when 'u' =>
 
  --  Undefined symbol are False
@@ -581,15 +593,15 @@
 
   --  Set the preprocessing flags according to the preprocessing data
 
-  if Current_Data.Comments and then not Current_Data.Always_Blank then
+  if Current_Data.Comments and not Current_Data.Always_Blank then
  Comment_Deleted_Lines := True;
  Blank_Deleted_Lines   := False;
-
   else
  Comment_Deleted_Lines := False;
  Blank_Deleted_Lines   := True;
   end if;
 
+  No_Deletion := Current_Data.No_Deletion;
   Undefined_Symbols_Are_False := Current_Data.Undef_False;
   List_Preprocessing_Symbols  := Current_Data.List_Symbols;
 
I

[Ada] Warn on Ada 2012 set membership test duplicate element

2012-10-04 Thread Arnaud Charlet
This patch adds a warning if a duplicate literal entry is found in
an Ada 2012 set membership, as shown by this example:

 1. pragma Ada_2012;
 2. package Dupset is
 3.a : integer;
 4.b : character;
 5.c : boolean := a in 1 |
 6.2 |
 7.3 |
 8.1 |
   |
>>> warning: duplicate of value given at line 5

 9.5;
10.d : boolean := b in 'a' |
11.'b' |
12.'c' |
13.'b';
   |
>>> warning: duplicate of value given at line 11

14.
15.type Day is (Mon, Tue, Wed, Thu, Fri);
16.x : Day;
17.e : boolean := x in Mon | Tue |
18.Wed | Mon;
 |
>>> warning: duplicate of value given at line 17

19. end;

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-10-04  Robert Dewar  

* sem_res.adb (Resolve_Set_Membership): Warn on duplicates.

Index: sem_res.adb
===
--- sem_res.adb (revision 192066)
+++ sem_res.adb (working copy)
@@ -7685,10 +7685,11 @@
   
 
   procedure Resolve_Set_Membership is
- Alt : Node_Id;
+ Alt  : Node_Id;
+ Ltyp : constant Entity_Id := Etype (L);
 
   begin
- Resolve (L, Etype (L));
+ Resolve (L, Ltyp);
 
  Alt := First (Alternatives (N));
  while Present (Alt) loop
@@ -7699,11 +7700,51 @@
 if not Is_Entity_Name (Alt)
   or else not Is_Type (Entity (Alt))
 then
-   Resolve (Alt, Etype (L));
+   Resolve (Alt, Ltyp);
 end if;
 
 Next (Alt);
  end loop;
+
+ --  Check for duplicates for discrete case
+
+ if Is_Discrete_Type (Ltyp) then
+declare
+   type Ent is record
+  Alt : Node_Id;
+  Val : Uint;
+   end record;
+
+   Alts  : array (0 .. List_Length (Alternatives (N))) of Ent;
+   Nalts : Nat;
+
+begin
+   --  Loop checking duplicates. This is quadratic, but giant sets
+   --  are unlikely in this context so it's a reasonable choice.
+
+   Nalts := 0;
+   Alt := First (Alternatives (N));
+   while Present (Alt) loop
+  if Is_Static_Expression (Alt)
+and then (Nkind_In (Alt, N_Integer_Literal,
+ N_Character_Literal)
+   or else Nkind (Alt) in N_Has_Entity)
+  then
+ Nalts := Nalts + 1;
+ Alts (Nalts) := (Alt, Expr_Value (Alt));
+
+ for J in 1 .. Nalts - 1 loop
+if Alts (J).Val = Alts (Nalts).Val then
+   Error_Msg_Sloc := Sloc (Alts (J).Alt);
+   Error_Msg_N ("duplicate of value given#?", Alt);
+end if;
+ end loop;
+  end if;
+
+  Alt := Next (Alt);
+   end loop;
+end;
+ end if;
   end Resolve_Set_Membership;
 
--  Start of processing for Resolve_Membership_Op


[Ada] Fix value of GNAT.Command_Line.Full_Switch on invalid switch

2012-10-04 Thread Arnaud Charlet
This patch fixes the value returned by Full_Switch when the user
provided an invalid long switch (instead of return "--", Full_Switch
will now return "--long"), as in this example:

with GNAT.Command_Line;   use GNAT.Command_Line;
with Ada.Text_IO; use Ada.Text_IO;
procedure Main is
begin
   while True loop
  case Getopt ("o: -long: s") is
 when 'o' | 's' => null;
 when '-' => null;
 when others => exit;
  end case;
   end loop;
exception
   when GNAT.Command_Line.Invalid_Switch =>
  Put_Line ("invalid switch: " & Full_Switch);
end Main;

Calling "./main --lond" will now indicate that "--lond" is invalid,
not "--".

Tested on x86_64-pc-linux-gnu, committed on trunk

2012-10-04  Emmanuel Briot  

* g-comlin.adb (Getopt): Fix value of Full_Switch returned in case of
invalid switch.

Index: g-comlin.adb
===
--- g-comlin.adb(revision 192066)
+++ g-comlin.adb(working copy)
@@ -39,6 +39,10 @@
 
 package body GNAT.Command_Line is
 
+   --  General note: this entire body could use much more commenting. There
+   --  are large sections of uncommented code throughout, and many formal
+   --  parameters of local subprograms are not documented at all ???
+
package CL renames Ada.Command_Line;
 
type Switch_Parameter_Type is
@@ -56,6 +60,12 @@
   Extra: Character := ASCII.NUL);
pragma Inline (Set_Parameter);
--  Set the parameter that will be returned by Parameter below
+   --
+   --  Extra is a character that needs to be added when reporting Full_Switch.
+   --  (it will in general be the switch character, for instance '-').
+   --  Otherwise, Full_Switch will report 'f' instead of '-f'. In particular,
+   --  it needs to be set when reporting an invalid switch or handling '*'.
+   --
--  Parameters need to be defined ???
 
function Goto_Next_Argument_In_Section (Parser : Opt_Parser) return Boolean;
@@ -95,9 +105,9 @@
   Index_In_Switches : out Integer;
   Switch_Length : out Integer;
   Param : out Switch_Parameter_Type);
-   --  Return the Longest switch from Switches that at least partially
-   --  partially Arg. Index_In_Switches is set to 0 if none matches.
-   --  What are other parameters??? in particular Param is not always set???
+   --  Return the Longest switch from Switches that at least partially matches
+   --  Arg. Index_In_Switches is set to 0 if none matches. What are other
+   --  parameters??? in particular Param is not always set???
 
procedure Unchecked_Free is new Ada.Unchecked_Deallocation
  (Argument_List, Argument_List_Access);
@@ -663,17 +673,45 @@
 
  if Index_Switches = 0 then
 
---  Depending on the value of Concatenate, the full switch is
---  a single character or the rest of the argument.
+--  Find the current switch that we did not recognize. This is in
+--  fact difficult because Getopt does not know explicitly about
+--  short and long switches. Ideally, we would want the following
+--  behavior:
 
-End_Index :=
-  (if Concatenate then Parser.Current_Index else Arg'Last);
+--  * for short switches, with Concatenate:
+--if -a is not recognized, and the command line has -daf
+--we should report the invalid switch as "-a".
 
+--  * for short switches, wihtout Concatenate:
+--we should report the invalid switch as "-daf".
+
+--  * for long switches:
+--if the commadn line is "--long" we should report --long
+--as unrecongized.
+
+--  Unfortunately, the fact that long switches start with a
+--  duplicate switch character is just a convention (so we could
+--  have a long switch "-long" for instance). We'll still rely on
+--  this convention here to try and get as helpful an error message
+--  as possible.
+
+--  Long switch case (starting with double switch character)
+
+if Arg (Arg'First + 1) = Parser.Switch_Character then
+   End_Index := Arg'Last;
+
+--  Short switch case
+
+else
+   End_Index :=
+ (if Concatenate then Parser.Current_Index else Arg'Last);
+end if;
+
 if Switches (Switches'First) = '*' then
 
-   --  Always prepend the switch character, so that users know that
-   --  this comes from a switch on the command line. This is
-   --  especially important when Concatenate is False, since
+   --  Always prepend the switch character, so that users know
+   --  that this comes from a switch on the command line. This
+   --  is especially

Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

2012-10-04 Thread Steven Bosscher
On Wed, Oct 3, 2012 at 5:35 PM, Steven Bosscher  wrote:
> The "worst" result is this:
> Compressing live ranges: from 726174 to 64496 - 8%, pre_count 40476128, 
> post_count 12483414
>
> But that's still a lot better than before the patch for the same function:
> Compressing live ranges: from 1742569 to 73069 - 4%, pre_count 40842330, 
> post_count 12479992

Walking basic blocks with FOR_EACH_BB_REVERSE gives:

Only FOR_EACH_BB_REVERSE:
Compressing live ranges: from 1742579 to 429746 - 24% pre_count
41106212, post_count 34376494
Compressing live ranges: from 1742569 to 63000 - 3% pre_count
40835340, post_count 11055747

FOR_EACH_BB_REVERSE + need_curr_point_incr:
Compressing live ranges: from 726184 to 416529 - 57% pre_count
40743516, post_count 34376846
Compressing live ranges: from 726174 to 61840 - 8% pre_count 40472806,
post_count 11055747

The combination of the two changes takes ~20s off the ~180s for "LRA
create live ranges".

Ciao!
Steven


Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars

2012-10-04 Thread Tobias Burnus
Thanks for the suggestions. The attached patch changes all "."-something 
symbol names, which I found.


Build and regtested on x86-64-gnu-linux.
OK for the trunk and 4.7? (".saved_dovar" also occurs in 4.6; we could 
also backport that part to 4.6, but I am not sure whether it is needed.)


We probably should also bump the .mod version (and timely also commit 
the patch http://gcc.gnu.org/ml/fortran/2012-04/msg00033.html). Comments?


Tobias


Am 04.10.2012 01:07, schrieb David Edelsohn:

For C and C++, identifiers beginning with underscore and upper case
letter or with two underscores are reserve to the implementation.  C++
uses _Z for mangling.

Maybe Fortran could prepend "_F".  Something beginning with an
underscore seems like a much better choice, given the rules about
reserved identifiers.

Thanks, David

On Wed, Oct 3, 2012 at 5:00 PM, Tobias Burnus  wrote:

David,


David Edelsohn wrote:

I am not sure why you chose a period and how best to correct this.


Well, in principle any name which the user cannot enter would do. (Not
enter: At least not as valid Fortran identifier.)

The reason for choosing "." is that  is used elsewhere in
gfortran for such identifier for the string-length variable belonging to
, e.g. "._result" in trans-decl.c. I assume the reason that it
didn't pop up with those is that those are local variables, but I wouldn't
be surprised if it would break elsewhere.

I wonder whether "@" would work, otherwise, one could also use "_". The only
other problem is that it will break the ABI. On the other hand, it's a
rather new feature and if we bump the .mod version number, the chance that
one effectively forces the user to re-compile is rather high. So far we
always bumped the .mod version number as something changed. There are also
some other patches pending which effectively lead to a bump in the .mod
version.

(The .mod version won't affect code which doesn't use modules such as
BLAS/LAPACK or any Fortran 66/77 code, but those won't be affected by the
ABI change anyway as there the name doesn't propagate as it does with
modules..)


Thanks for investigating the test-suite failure.

Tobias


2012-10-04  Tobias Burnus  

	* trans-decl.c (gfc_create_string_length, create_function_arglist):
	Don't create a symbol which contains a dot.
	* trans-stmt.c (gfc_trans_simple_do, gfc_trans_do): Ditto.

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 910b150..f41fc8b 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -1097,9 +1097,9 @@ gfc_create_string_length (gfc_symbol * sym)
 
   /* Also prefix the mangled name.  */
   if (sym->module)
-	name = gfc_get_string (".__%s_MOD_%s", sym->module, sym->name);
+	name = gfc_get_string ("_F_%s_MOD_%s", sym->module, sym->name);
   else
-	name = gfc_get_string (".%s", sym->name);
+	name = gfc_get_string ("_F%s", sym->name);
 
   length = build_decl (input_location,
 			   VAR_DECL, get_identifier (name),
@@ -1984,7 +1984,7 @@ create_function_arglist (gfc_symbol * sym)
 
 	  length = build_decl (input_location,
 			   PARM_DECL,
-			   get_identifier (".__result"),
+			   get_identifier ("_Flen__result"),
 			   len_type);
 	  if (!sym->ts.u.cl->length)
 	{
@@ -2007,7 +2007,7 @@ create_function_arglist (gfc_symbol * sym)
 		{
 		  tree len = build_decl (input_location,
 	 VAR_DECL,
-	 get_identifier ("..__result"),
+	 get_identifier ("_Flen2__result"),
 	 gfc_charlen_type_node);
 		  DECL_ARTIFICIAL (len) = 1;
 		  TREE_USED (len) = 1;
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 204f069..37fc6ee 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -1376,7 +1376,7 @@ gfc_trans_simple_do (gfc_code * code, stmtblock_t *pblock, tree dovar,
   /* Save value for do-tinkering checking. */
   if (gfc_option.rtcheck & GFC_RTCHECK_DO)
 {
-  saved_dovar = gfc_create_var (type, ".saved_dovar");
+  saved_dovar = gfc_create_var (type, "_F_saved_dovar");
   gfc_add_modify_loc (loc, pblock, saved_dovar, dovar);
 }
 
@@ -1581,7 +1581,7 @@ gfc_trans_do (gfc_code * code, tree exit_cond)
   /* Save value for do-tinkering checking. */
   if (gfc_option.rtcheck & GFC_RTCHECK_DO)
 {
-  saved_dovar = gfc_create_var (type, ".saved_dovar");
+  saved_dovar = gfc_create_var (type, "_F_saved_dovar");
   gfc_add_modify_loc (loc, &block, saved_dovar, dovar);
 }
 


[RFA 1/5] New port: CR16: Remove "gdb" from noconfigdirs in configure.ac

2012-10-04 Thread Kaushik Phatak
Hi,
This patch is one of patch set to add a new port (National Instruments CR16) in 
gdb.
This patch will,
- Remove "gdb" from noconfigdirs in top-level configure.ac.
- Add target-lobgloss
- Make target OS independent in config.sub

OK for gcc and binutils?

Regards,
Kaushik


2012-10-04  Kaushik Phatak  
Changelog
* configure.ac (noconfigdirs): Remove gdb. Add target-libgloss.
* configure: Regenerate.
* config.sub: Make target OS independent

--- ./gdb_src.orig/configure2012-06-28 17:20:52.0 +0530
+++ ./gdb_src/configure 2012-08-17 16:56:10.0 +0530
@@ -3447,7 +3447,7 @@ case "${target}" in
 noconfigdirs="$noconfigdirs target-libgloss gdb"
 ;;
   cr16-*-*)
-noconfigdirs="$noconfigdirs gdb"
+noconfigdirs="$noconfigdirs target-libgloss"
 ;;
   d10v-*-*)
 noconfigdirs="$noconfigdirs target-libgloss"
--- ./gdb_src.orig/configure.ac 2012-08-17 23:07:18.0 +0530
+++ ./gdb_src/configure.ac  2012-08-17 16:56:17.0 +0530
@@ -873,7 +873,7 @@ case "${target}" in
 noconfigdirs="$noconfigdirs target-libgloss gdb"
 ;;
   cr16-*-*)
-noconfigdirs="$noconfigdirs gdb"
+noconfigdirs="$noconfigdirs target-libgloss"
 ;;
   d10v-*-*)
 noconfigdirs="$noconfigdirs target-libgloss"
--- ./gdb_src.orig/config.sub   2012-04-25 21:23:25.0 +0530
+++ ./gdb_src/config.sub2012-09-24 11:26:00.0 +0530
@@ -575,7 +575,6 @@ case $basic_machine in
;;
cr16 | cr16-*)
basic_machine=cr16-unknown
-   os=-elf
;;
crds | unos)
basic_machine=m68k-crds




Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars

2012-10-04 Thread Janus Weil
2012/10/4 Tobias Burnus :
> Thanks for the suggestions. The attached patch changes all "."-something
> symbol names, which I found.
>
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk and 4.7? (".saved_dovar" also occurs in 4.6; we could also
> backport that part to 4.6, but I am not sure whether it is needed.)

I think at least for trunk it should be ok.


> We probably should also bump the .mod version (and timely also commit the
> patch http://gcc.gnu.org/ml/fortran/2012-04/msg00033.html).

Strong agreement here :)


> Comments?

When backporting to 4.6 and 4.7, do you intend to also bump the module
version there? Does that make sense?

Moreover, ".__result" probably goes back even further than 4.6, right?

Cheers,
Janus



> Am 04.10.2012 01:07, schrieb David Edelsohn:
>
>> For C and C++, identifiers beginning with underscore and upper case
>> letter or with two underscores are reserve to the implementation.  C++
>> uses _Z for mangling.
>>
>> Maybe Fortran could prepend "_F".  Something beginning with an
>> underscore seems like a much better choice, given the rules about
>> reserved identifiers.
>>
>> Thanks, David
>>
>> On Wed, Oct 3, 2012 at 5:00 PM, Tobias Burnus  wrote:
>>>
>>> David,
>>>
>>>
>>> David Edelsohn wrote:

 I am not sure why you chose a period and how best to correct this.
>>>
>>>
>>> Well, in principle any name which the user cannot enter would do. (Not
>>> enter: At least not as valid Fortran identifier.)
>>>
>>> The reason for choosing "." is that  is used elsewhere in
>>> gfortran for such identifier for the string-length variable belonging to
>>> , e.g. "._result" in trans-decl.c. I assume the reason that it
>>> didn't pop up with those is that those are local variables, but I
>>> wouldn't
>>> be surprised if it would break elsewhere.
>>>
>>> I wonder whether "@" would work, otherwise, one could also use "_". The
>>> only
>>> other problem is that it will break the ABI. On the other hand, it's a
>>> rather new feature and if we bump the .mod version number, the chance
>>> that
>>> one effectively forces the user to re-compile is rather high. So far we
>>> always bumped the .mod version number as something changed. There are
>>> also
>>> some other patches pending which effectively lead to a bump in the .mod
>>> version.
>>>
>>> (The .mod version won't affect code which doesn't use modules such as
>>> BLAS/LAPACK or any Fortran 66/77 code, but those won't be affected by the
>>> ABI change anyway as there the name doesn't propagate as it does with
>>> modules..)
>>>
>>>
>>> Thanks for investigating the test-suite failure.
>>>
>>> Tobias
>
>


Re: [PATCH] Use __cxa_atexit on OpenBSD

2012-10-04 Thread Mark Kettenis
> Date: Wed, 3 Oct 2012 17:45:21 +0200 (CEST)
> From: Gerald Pfeifer 
> 
> On Sat, 15 Sep 2012, Ian Lance Taylor wrote:
> >> 2012-09-02  Mark Kettenis  
> >>
> >> * config.gcc (*-*-openbsd4.[3-9]|*-*-openbsd[5-9]*): Set
> >> default_use_cxa_atexit to yes.
> > This is OK.
> 
> I committed this to trunk and plan on doing so for the 4.7 branch
> as well (so that OpenBSD can benefit from a release branch of GCC
> carrying this) unless there are objections.

Thanks.  I won't object, although I'm not sure patching up the 4.7
branch is all that useful at this point given that it doesn't support
OpenBSD/amd64.

Any chance of getting the other diff that Ian approved

  

committed to trunk?

Thanks,

Mark


Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars

2012-10-04 Thread Janus Weil
>> Thanks for the suggestions. The attached patch changes all "."-something
>> symbol names, which I found.
>>
>> Build and regtested on x86-64-gnu-linux.
>> OK for the trunk and 4.7? (".saved_dovar" also occurs in 4.6; we could also
>> backport that part to 4.6, but I am not sure whether it is needed.)
>
> I think at least for trunk it should be ok.

One more comment: Since its appearance is a bit scattered in the code,
how about using a small macro which prepends the "_F" prefix to a
given variable name?

Btw, note that we are using a double underscore scheme in other places
(like __class, __vtab, __vtype, etc). I have even used an '@' in one
place, namely (hidden) procedure pointer results ("ppr@"). Is there a
need to unify all those cases?

Cheers,
Janus



>> Am 04.10.2012 01:07, schrieb David Edelsohn:
>>
>>> For C and C++, identifiers beginning with underscore and upper case
>>> letter or with two underscores are reserve to the implementation.  C++
>>> uses _Z for mangling.
>>>
>>> Maybe Fortran could prepend "_F".  Something beginning with an
>>> underscore seems like a much better choice, given the rules about
>>> reserved identifiers.
>>>
>>> Thanks, David
>>>
>>> On Wed, Oct 3, 2012 at 5:00 PM, Tobias Burnus  wrote:

 David,


 David Edelsohn wrote:
>
> I am not sure why you chose a period and how best to correct this.


 Well, in principle any name which the user cannot enter would do. (Not
 enter: At least not as valid Fortran identifier.)

 The reason for choosing "." is that  is used elsewhere in
 gfortran for such identifier for the string-length variable belonging to
 , e.g. "._result" in trans-decl.c. I assume the reason that it
 didn't pop up with those is that those are local variables, but I
 wouldn't
 be surprised if it would break elsewhere.

 I wonder whether "@" would work, otherwise, one could also use "_". The
 only
 other problem is that it will break the ABI. On the other hand, it's a
 rather new feature and if we bump the .mod version number, the chance
 that
 one effectively forces the user to re-compile is rather high. So far we
 always bumped the .mod version number as something changed. There are
 also
 some other patches pending which effectively lead to a bump in the .mod
 version.

 (The .mod version won't affect code which doesn't use modules such as
 BLAS/LAPACK or any Fortran 66/77 code, but those won't be affected by the
 ABI change anyway as there the name doesn't propagate as it does with
 modules..)


 Thanks for investigating the test-suite failure.

 Tobias
>>
>>


Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars

2012-10-04 Thread Mikael Morin
Le 04/10/2012 12:17, Janus Weil a écrit :
> 
> When backporting to 4.6 and 4.7, do you intend to also bump the module
> version there? Does that make sense?
> 
If the module version is different from that of trunk, please don't, as
we would have two different module formats with the same module version.
I have proposed before to encode the gcc major version in the module
version, but there wasn't much interest back then.
There should be a better way to detect ABI incompatibilities, really.

Mikael


[testsuite] Add support for -fdump-ada-spec

2012-10-04 Thread Eric Botcazou
Hi,

the recent breakage of -fdump-ada-spec revealed that the switch isn't 
exercised in the testsuite.  This patch adds the missing support.

Tested on x86_64-suse-linux, OK for mainline?  What of the 4.7 branch?


2012-10-04  Eric Botcazou  

* lib/gcc-dg.exp (cleanup-ada-spec): New procedure.
* lib/scanasm.exp (get_ada_spec_filename): Likewise.
(scan-ada-spec): Likewise.
(scan-ada-spec-not): Likewise.
* gcc.dg/dump-ada-spec-1.c: New test.
* g++.dg/other/dump-ada-spec-1.C: Likewise.


-- 
Eric BotcazouIndex: lib/gcc-dg.exp
===
--- lib/gcc-dg.exp	(revision 192073)
+++ lib/gcc-dg.exp	(working copy)
@@ -503,6 +503,19 @@ proc cleanup-stack-usage { } {
 }
 }
 
+# Remove an Ada spec file for the current test.
+proc cleanup-ada-spec { } {
+set testcase [testname-for-summary]
+remove-build-file "[get_ada_spec_filename $testcase]"
+
+# Clean up files for additional source files.
+if [info exists additional_sources] {
+	foreach srcfile $additional_sources {
+	remove-build-file "[get_ada_spec_filename $srcfile]"
+	}
+}
+}
+
 # Remove all dump files with the provided suffix.
 proc cleanup-dump { suffix } {
 set testcase [testname-for-summary]
Index: lib/scanasm.exp
===
--- lib/scanasm.exp	(revision 192073)
+++ lib/scanasm.exp	(working copy)
@@ -184,6 +184,38 @@ proc scan-stack-usage-not { args } {
 dg-scan "scan-file-not" 0 $testcase $output_file $args
 }
 
+# Return the filename of the Ada spec corresponding to the argument.
+
+proc get_ada_spec_filename { testcase } {
+# The name might include a list of options; extract the file name.
+set filename  [lindex $testcase 0]
+set tailname  [file tail $filename]
+set extension [string trimleft [file extension $tailname] {.}]
+set rootname  [regsub -all {\-} [file rootname $tailname] {_}]
+
+return [string tolower "${rootname}_${extension}.ads"]
+}
+
+# Look for a pattern in the .ads file produced by the compiler.  See
+# dg-scan for details.
+
+proc scan-ada-spec { args } {
+set testcase  [testname-for-summary]
+set output_file "[get_ada_spec_filename $testcase]"
+
+dg-scan "scan-file" 1 $testcase $output_file $args
+}
+
+# Check that a pattern is not present in the .ads file produced by the
+# compiler.  See dg-scan for details.
+
+proc scan-ada-spec-not { args } {
+set testcase  [testname-for-summary]
+set output_file "[get_ada_spec_filename $testcase]"
+
+dg-scan "scan-file-not" 0 $testcase $output_file $args
+}
+
 # Call pass if pattern is present given number of times, otherwise fail.
 proc scan-assembler-times { args } {
 if { [llength $args] < 2 } {
Index: gcc.dg/dump-ada-spec-1.c
===
--- gcc.dg/dump-ada-spec-1.c	(revision 0)
+++ gcc.dg/dump-ada-spec-1.c	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-ada-spec" } */
+
+struct S
+{
+  int i;
+};
+
+/* { dg-final { scan-ada-spec "type S is record" } } */
+/* { dg-final { cleanup-ada-spec } } */
Index: g++.dg/other/dump-ada-spec-1.C
===
--- g++.dg/other/dump-ada-spec-1.C	(revision 0)
+++ g++.dg/other/dump-ada-spec-1.C	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-ada-spec" } */
+
+struct S
+{
+  int i;
+};
+
+/* { dg-final { scan-ada-spec "type S is record" } } */
+/* { dg-final { cleanup-ada-spec } } */


Re: [Patch, Fortran, OOP] PR 54784: [4.7/4.8 Regression] wrong code in polymorphic allocation with SOURCE

2012-10-04 Thread Mikael Morin
Le 03/10/2012 18:48, Janus Weil a écrit :
> Hi all,
> 
> here is a small patch for a wrong-code regression with polymorphic
> allocation. The problem is that we falsely detect the allocation
> variable to be a polymorphic array (although it is a scalar). For
> further details see the PR, in particular comment 4.
> 
> Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.7?
Hello, the fix looks incomplete.

> 
> Index: gcc/fortran/trans-stmt.c
> ===
> --- gcc/fortran/trans-stmt.c  (revision 192004)
> +++ gcc/fortran/trans-stmt.c  (working copy)
> @@ -5145,7 +5145,9 @@ gfc_trans_allocate (gfc_code * code)
> dataref = actual->next->expr->ref;
> /* Make sure we go up through the reference chain to
>the _data reference, where the arrayspec is found.  */
> -   while (dataref->next && dataref->next->type != REF_ARRAY)
> +   while (!(dataref->type == REF_COMPONENT
> +&& strcmp (dataref->u.c.component->name, "_data") == 0)
> +  && dataref->next)
this stops on the first class reference, while it seems to me that it
should stop on the last.

Mikael


Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars

2012-10-04 Thread Janne Blomqvist
On Thu, Oct 4, 2012 at 1:50 PM, Janus Weil  wrote:
>>> Thanks for the suggestions. The attached patch changes all "."-something
>>> symbol names, which I found.
>>>
>>> Build and regtested on x86-64-gnu-linux.
>>> OK for the trunk and 4.7? (".saved_dovar" also occurs in 4.6; we could also
>>> backport that part to 4.6, but I am not sure whether it is needed.)
>>
>> I think at least for trunk it should be ok.
>
> One more comment: Since its appearance is a bit scattered in the code,
> how about using a small macro which prepends the "_F" prefix to a
> given variable name?

For "normal" identifiers in a module, the current scheme of
"__modulename_MOD_symbolname" is probably too widely entrenched e.g.
in debuggers and various interoperability toolkits to be worth
changing at this point. The OOP stuff OTOH is IMHO sufficiently new
that there is little harm in changing it.

I was thinking about this in the beginning of the year and produced
the attached document (I never sent it before as I realized I wouldn't
have time to do anything about it myself in the near future). Funnily,
I also came up with the idea of "_F" at that point, though maybe not
so surprising as I also studied the g++ name mangling for inspiration.
Also note that the document itself has a perhaps naive approach which
does not consider backwards compatibility enough (see e.g. the above
paragraph).

Some related mangling PR's:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51802

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52606 (not the PR itself,
but the discussion in the comments)

-- 
Janne Blomqvist
Gfortran name mangling ABI
==

It would be nice if GFortran would have a documented, consistent name
mangling ABI. This would reduce the risk of an inadvertent name clash
with some other software, and make it easier for 3rd party tools such
as debuggers, profilers etc. to demangle symbol names.

If, and when, the ABI is broken due to the array descriptor update, we
could also think about fixing this issue.

An explicit non-goal of this is to come up with some common
cross-compiler name mangling ABI, as it seems very unlikely that other
compiler vendors will want to change their mangling, and mangling
alone is a very small part of ABI compability.

Another non-goal is to change the mangling of the "F77" interface
(lowercase, append underscore(s) depending on the compiler
options). Thus the following discussion refers only to mangling "F90+"
names, e.g. module procedures and so forth.

Rules of the road
-

Some names are "reserved identifiers", reserved for the
implementation. Mangled names should be such reserved names, in order
to not clash with user-defined names.

Fortran specifies that names are of the form "letter + alphanum".

C and C++ reserved identifiers (that is, identifiers which are
reserved for use by the implementation) are

- A name beginning with an underscore followed by a capital letter.

- A name containing double underscore (plain C reserves only names
  beginning with double underscore, but C++ reserves anything with
  double underscores).

- POSIX adds additional restrictions wrt various POSIX functionality.

Thus, choosing names beginning with an underscore followed by either a
capital letter or another underscore should be good.

Current GFortran name mangling
--

Currently the name mangling is a bit ad-hoc, with several different
prefixes depending on which part of the compiler is used:

- Procedure "foo" in module "bar": __bar_MOD_foo

- Runtime library: _gfortran_XXX

- Runtime library internal functions (not visible if symbol visibility
  is supported): _gfortrani_XXX

- ISO_C_BINDING functions in the library: __iso_c_binding_XXX

- OOP stuff: __class_XXX

- Others?


The C++ name mangling
-

For inspiration, see the C++ name mangling ABI that GCC follows at 

http://sourcery.mentor.com/public/cxx-abi/abi.html#mangling

http://sourcery.mentor.com/public/cxx-abi/abi-examples.html#mangling

http://sourcery.mentor.com/public/cxx-abi/abi-mangling.html


The C++ name mangling ABI, in a very simplified form, is

- Everything has the prefix "_Z".

- names are encoded as  pairs.

- At the end of a function symbol there is a "E", followed by the type
  of the function arguments (in order to handle overloading).

- Outside of names, characters have meaning as various flags,
  e.g. "TI" means the identifier is a typeinfo structure, or "TV" for
  a virtual table, and so on.

E.g. a member function "foo(void)" in a class "Test" (Test::foo())
would thus be encoded as "_ZN4Test3fooEv". (The "N" means it's a
nested name, "v" at the end means the void argument).


Proposed GFortran mangling
--

Fortran name mangling requirements are considerably simpler than C++,
due to Fortran not having function overloading (yes, Fortran has
generic interfaces, which are a bit different and don't require
mangling), nor templates.

- Every sy

Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars

2012-10-04 Thread Tobias Burnus

Hi Janus,

Janus Weil wrote:
When backporting to 4.6 and 4.7, do you intend to also bump the module 
version there? Does that make sense?


Probably not. The .__result and the .saved_dovar are not ABI relevant, 
thus, they can be changed without problems - but also not that 
important. The module variable is more crucial, but I think we should 
avoid bumping the module version. I think we should only change the trunk.



Moreover, ".__result" probably goes back even further than 4.6, right?


Presumably yes.

Janus Weil wrote:

One more comment: Since its appearance is a bit scattered in the code,
how about using a small macro which prepends the "_F" prefix to a
given variable name?

Btw, note that we are using a double underscore scheme in other places
(like __class, __vtab, __vtype, etc). I have even used an '@' in one
place, namely (hidden) procedure pointer results ("ppr@"). Is there a
need to unify all those cases?


It think it would be useful to unify those. Are you volunteering?

Regarding the "@": GCC only has: NO_DOLLAR_IN_LABEL and NO_DOT_IN_LABEL. 
However, I would be also careful with the @ label.


The @ has also a special meaning to assemblers. However, if the name 
with the @ is not publicly visible, the name does not occur in the 
assembler file (except as string for the debug information). Thus, there 
should be no real problem. Still, one should consider to change also 
them to the _F convention.


Tobias


Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 11:43 AM, Steven Bosscher  wrote:
> On Wed, Oct 3, 2012 at 5:35 PM, Steven Bosscher  wrote:
>> The "worst" result is this:
>> Compressing live ranges: from 726174 to 64496 - 8%, pre_count 40476128, 
>> post_count 12483414
>>
>> But that's still a lot better than before the patch for the same function:
>> Compressing live ranges: from 1742569 to 73069 - 4%, pre_count 40842330, 
>> post_count 12479992
>
> Walking basic blocks with FOR_EACH_BB_REVERSE gives:
>
> Only FOR_EACH_BB_REVERSE:
> Compressing live ranges: from 1742579 to 429746 - 24% pre_count
> 41106212, post_count 34376494
> Compressing live ranges: from 1742569 to 63000 - 3% pre_count
> 40835340, post_count 11055747
>
> FOR_EACH_BB_REVERSE + need_curr_point_incr:
> Compressing live ranges: from 726184 to 416529 - 57% pre_count
> 40743516, post_count 34376846
> Compressing live ranges: from 726174 to 61840 - 8% pre_count 40472806,
> post_count 11055747
>
> The combination of the two changes takes ~20s off the ~180s for "LRA
> create live ranges".

Isn't _REVERSE vs. non-_RESERVE still kind-of "random" order?  Thus, doesn't
the above show there exists an optimal order for processing which we could use?
(I realize _REVERSE is a simple solution, but might there not exist a
pathological
case where _REVERSE is even worse than non-_REVERSE?)

Richard.

> Ciao!
> Steven


Re: libgo patch committed: Use libbacktrace

2012-10-04 Thread Rainer Orth
Ian Lance Taylor  writes:

> This patch to libgo changes it to use libbacktrace.  Previously
> backtraces required the Go package debug/elf to register itself with the
> runtime during the package initialization, which only worked if the
> program actually imported debug/elf one way or another.  Bootstrapped
> and ran Go testsuite on x86_64-unknown-linux-gnu.  Committed to
> mainline.

Unfortunately, this breaks all use of libgo on versions of Solaris < 11
which lack strnlen:

Undefined   first referenced
 symbol in file
strnlen /var/gcc/regression/trunk/10-gcc/build/i386-
pc-solaris2.10/libgo/.libs/libgo.so
ld: fatal: symbol referencing errors. No output written to a.out
collect2: error: ld returned 1 exit status
FAIL: bufio

One could either try to also link libiberty into libgo.la, but that has
the complication of needing to decide whether to use libiberty.a or
pic/libiberty.a since libiberty is no libtool library.

Alternatively, one could add another implementation of strnlen to libgo,
which duplicates code.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: patch to fix

2012-10-04 Thread Richard Guenther
On Wed, Oct 3, 2012 at 7:15 PM, Kenneth Zadeck  wrote:
> The enclosed patch is the third of at least four patches that fix the
> problems associated with supporting integers on the target that are
> wider than two HOST_WIDE_INTs.
>
> While GCC claims to support OI mode, and we have two public ports that
> make minor use of this mode, in practice, compilation that uses OImode
> mode commonly gives the wrong result or ices.  We have a private port
> of GCC for an architecture that is further down the road to needing
> comprehensive OImode and we have discovered that this is unusable. We
> have decided to fix it in a general way that so that it is most
> beneficial to the GCC community.  It is our belief that we are just a
> little ahead of the X86 and the NEON and these patches will shortly be
> essential.
>
> The first two of these patches were primarily lexigraphical and have
> already been committed.They transformed the uses of CONST_DOUBLE
> so that it is easy to tell what the intended usage is.
>
> The underlying structures in the next two patches are very general:
> once they are added to the compiler, the compiler will be able to
> support targets with any size of integer from hosts of any size
> integer.
>
> The patch enclosed deals with the portable RTL parts of the compiler.
> The next patch, which is currently under construction deals with the
> tree level.  However, this patch can be put on the trunk as is, and it
> will eleviate many, but not all of the current limitations in the rtl
> parts of the compiler.
>
> Some of the patch is conditional, depending on a port defining the
> symbol 'TARGET_SUPPORTS_WIDE_INT' to be non zero.  Defining this
> symbol to be non zero is declaring that the port has been converted to
> use the new form or integer constants.  However, the patch is
> completely backwards compatible to allow ports that do not need this
> immediately to convert at their leasure.  The conversion process is
> not difficult, but it does require some knowledge of the port, so we
> are not volinteering to do this for all ports.
>
> OVERVIEW OF THE PATCH:
>
> The patch defines a new datatype, a 'wide_int' (defined in
> wide-int.[ch], and this datatype will be used to perform all of the
> integer constant math in the compiler.  Externally, wide-int is very
> similar to double-int except that it does not have the limitation that
> math must be done on exactly two HOST_WIDE_INTs.
>
> Internally, a wide_int is a structure that contains a fixed sized
> array of HOST_WIDE_INTs, a length field and a mode.  The size of the

That it has a mode sounds odd to me and makes it subtly different
from HOST_WIDE_INT and double-int.  Maybe the patch will tell
why this is so.

> array is determined at generation time by dividing the number of bits
> of the largest integer supported on the target by the number of bits
> in a HOST_WIDE_INT of the host.  Thus, with this format, any length of
> integer can be supported on any host.
>
> A new rtx type is created, the CONST_WIDE_INT, which contains a
> garbage collected array of HOST_WIDE_INTS that is large enough to hold
> the constant.  For the targets that define TARGET_SUPPORTS_WIDE_INT to
> be non zero, CONST_DOUBLES are only used to hold floating point
> values.  If the target leaves TARGET_SUPPORTS_WIDE_INT defined as 0,
> CONST_WIDE_INTs are not used and CONST_DOUBLEs are as they were
> before.
>
> CONST_INT does not change except that it is defined to hold all
> constants that fit in exactly one HOST_WIDE_INT.  Note that is slightly
> different than the current trunk.  Before this patch, the TImode
> constant '5' could either be in a CONST_INT or CONST_DOUBLE depending
> on which code path was used to create it.  This patch changes this so
> that if the constant fits in a CONST_INT then it is represented in a
> CONST_INT no matter how it is created.
>
> For the array inside a CONST_WIDE_INT, and internally in wide-int, we
> use a compressed form for integers that need more than one
> HOST_WIDE_INT.  Higher elements of the array are not needed if they
> are just a sign extension of the elements below them.  This does not
> imply that constants are signed or are sign extended, this is only a
> compression technique.
>
> While it might seem to be more esthetically pleasing to have not
> introduced the CONST_WIDE_INT and to have changed the representation
> of the CONST_INT to accomodate larger numbers, this would have both
> used more space and would be a time consuming change for the port
> maintainers.  We believe that most ports can be quickly converted with
> the current scheme because there is just not a lot of code in the back
> ends that cares about large constants.  Furthermore, the CONST_INT is
> very space efficient and even in a program that was heavy in large
> values, most constants would still fit in a CONST_INT.
>
> All of the parts of the rtl level that deal with CONST_DOUBLE as an
> now conditionally work with CONST_WIDE_INTs depending on

Re: [Fortran, (RFC) patch] PR49110/51055 Assignment to alloc. deferred-length character vars

2012-10-04 Thread Janus Weil
Hi,

>> One more comment: Since its appearance is a bit scattered in the code,
>> how about using a small macro which prepends the "_F" prefix to a
>> given variable name?
>>
>> Btw, note that we are using a double underscore scheme in other places
>> (like __class, __vtab, __vtype, etc). I have even used an '@' in one
>> place, namely (hidden) procedure pointer results ("ppr@"). Is there a
>> need to unify all those cases?
>
>
> It think it would be useful to unify those. Are you volunteering?

yeah, why not ;)

Attached is a draft patch (not regtested), which adds a macro
GFC_PREFIX (in gfortran.h) to prepend "_F" to the cases included in
Tobias' earlier patch as well as the OOP-related stuff and procedure
pointer results. It also bumps the module version.

Any comments so far? (Of course the name of the macro can be debated.
I just tried to keep it short for now.)

Cheers,
Janus


mangling.diff
Description: Binary data


Re: [Patch] Fix PR53397

2012-10-04 Thread Richard Guenther
On Tue, Oct 2, 2012 at 6:40 PM, Kumar, Venkataramanan
 wrote:
> Hi Richi,
>
> (Snip)
>> + (!cst_and_fits_in_hwi (step))
>> +{
>> +  if( loop->inner != NULL)
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +{
>> +  fprintf (dump_file, "Reference %p:\n", (void *) ref);
>> +  fprintf (dump_file, "(base " );
>> +  print_generic_expr (dump_file, base, TDF_SLIM);
>> +  fprintf (dump_file, ", step ");
>> +  print_generic_expr (dump_file, step, TDF_TREE);
>> +  fprintf (dump_file, ")\n");
>
> No need to repeat this - all references are dumped when we gather them.
> (Snip)
>
> The dumping happens at "record_ref" which is called after these statements to 
> record these references.
>
> When the step is invariant  we return from the function without recording the 
> references.
>
>  so I thought of dumping the references here.
>
> Is there a cleaner way to dump the references at one place?

Yes, call dump_mem_ref then, instead of repeating parts of its body.

Richard.

> Regards,
> Venkat.
>
>
>
> -Original Message-
> From: Richard Guenther [mailto:rguent...@suse.de]
> Sent: Tuesday, October 02, 2012 5:42 PM
> To: Kumar, Venkataramanan
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [Patch] Fix PR53397
>
> On Mon, 1 Oct 2012, venkataramanan.ku...@amd.com wrote:
>
>> Hi,
>>
>> The below patch fixes the FFT/Scimark regression caused by useless
>> prefetch generation.
>>
>> This fix tries to make prefetch less aggressive by prefetching arrays
>> in the inner loop, when the step is invariant in the entire loop nest.
>>
>> GCC currently tries to prefetch invariant steps when they are in the
>> inner loop. But does not check if the step is variant in outer loops.
>>
>> In the scimark FFT case, the trip count of the inner loop varies by a
>> non constant step, which is invariant in the inner loop.
>> But the step variable is varying in outer loop. This makes inner loop
>> trip count small (at run time varies sometimes as small as 1
>> iteration)
>>
>> Prefetching ahead x iteration when the inner loop trip count is
>> smaller than x leads to useless prefetches.
>>
>> Flag used: -O3 -march=amdfam10
>>
>> Before
>> **  **
>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>> ** for details. (Results can be submitted to p...@nist.gov) **
>> **  **
>> Using   2.00 seconds min time per kenel.
>> Composite Score:  550.50
>> FFT Mflops:38.66(N=1024)
>> SOR Mflops:   617.61(100 x 100)
>> MonteCarlo: Mflops:   173.74
>> Sparse matmult  Mflops:   675.63(N=1000, nz=5000)
>> LU  Mflops:  1246.88(M=100, N=100)
>>
>>
>> After
>> **  **
>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>> ** for details. (Results can be submitted to p...@nist.gov) **
>> **  **
>> Using   2.00 seconds min time per kenel.
>> Composite Score:  639.20
>> FFT Mflops:   479.19(N=1024)
>> SOR Mflops:   617.61(100 x 100)
>> MonteCarlo: Mflops:   173.18
>> Sparse matmult  Mflops:   679.13(N=1000, nz=5000)
>> LU  Mflops:  1246.88(M=100, N=100)
>>
>> GCC regression "make check -k" passes with x86_64-unknown-linux-gnu
>> New tests that PASS:
>>
>> gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c
>> scan-tree-dump aprefetch "Issued prefetch"
>> gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c
>> scan-tree-dump aprefetch "loop variant step"
>> gcc.dg/pr53397-2.c scan-tree-dump aprefetch "Not prefetching"
>> gcc.dg/pr53397-2.c (test for excess errors)
>>
>>
>> Checked CPU2006 and polyhedron on latest AMD processor, no regressions noted.
>>
>> Ok to commit in trunk?
>>
>> regards,
>> Venkat
>>
>> gcc/ChangeLog
>> +2012-10-01  Venkataramanan Kumar  
>> +
>> +   * tree-ssa-loop-prefetch.c (gather_memory_references_ref):$
>> +   Perform non constant step prefetching in inner loop, only $
>> +   when it is invariant in the entire loop nest.  $
>> +   * testsuite/gcc.dg/pr53397-1.c: New test case $
>> +   Checks we are prefecthing for loop invariant steps$
>> +   * testsuite/gcc.dg/pr53397-2.c: New test case$
>> +   Checks we are not prefecthing for loop variant steps
>> +
>>
>>
>> Index: gcc/testsuite/gcc.dg/pr53397-1.c
>> ===
>> --- gcc/testsuite/gcc.dg/pr53397-1.c  (revision 0)
>> +++ gcc/testsuite/gcc.dg/pr53397-1.c  (revision 0)
>> @@ -0,0 +1,28 @@
>> +/* Prefetching when the step is loop invariant.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3 -fprefetch-loop-arrays
>> +-fdump-tree-aprefetch-details --param min-insn-to-pre

Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

2012-10-04 Thread Steven Bosscher
On Thu, Oct 4, 2012 at 1:30 PM, Richard Guenther
 wrote:
> Isn't _REVERSE vs. non-_RESERVE still kind-of "random" order?

Not at this stage. For cfglayout mode I would answer yes, but IRA/LRA
operates in cfgrtl mode, so the sequence of insns and basic blocks
must match. Therefore, if you walk the basic blocks in reverse, and
the insns in each basic block in reverse, you effectively work on a,
let's say, "reverse extended basic block" (??) in that your program
points are sequential across fallthrough edges.

>  Thus, doesn't
> the above show there exists an optimal order for processing which we could 
> use?

There may be a smarter order: Could even walk blocks in that order if
you know a priori what path through the CFG minimizes the length of
the live range chains. But to know that order, you have to build the
chains. So chicken-and-egg...

> (I realize _REVERSE is a simple solution, but might there not exist a
> pathological case where _REVERSE is even worse than non-_REVERSE?)

Intuitively, I'm certain that _REVERSE is always better than
non-_REVERSE, although I don't know how to prove that :-)

Ciao!
Steven


Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-10-04 Thread Iain Buclaw
I would like to get a bump on this.

It's been a while, and there have been quite a number of changes since
the initial post that address many of the issues raised.  Rather than
reposting patches, someone mentioned attaching changelog, well, here
it is.

Repository is still located here: https://github.com/D-Programming-GDC/GDC

Would it be possible to have a re-newed review?  I'm still keen on
pushing this, however I'm not certain of the right plan of execution.
:-)


Thanks,
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


ChangeLog.gz
Description: GNU Zip compressed data


Re: patch to fix

2012-10-04 Thread Marc Glisse

On Wed, 3 Oct 2012, Mike Stump wrote:

On Oct 3, 2012, at 1:47 PM, Marc Glisse  wrote:
did you consider making the size of wide_int a template parameter, now 
that we are using C++? All with a convenient typedef or macro so it 
doesn't show. I am asking because in vrp I do some arithmetic that 
requires 2*N+1 bits where N is the size of double_int.


No, not really.  I'd maybe answer it this way, we put in a type 
(singular) to support all integral constants in all languages on a port. 
Since we only needed 1, there was little need to templatize it.  By 
supporting all integral constants in all languages, there is little need 
for more.  If Ada say, wanted a 2048 bit integer, then, we just have it 
drop off the size it wants someplace and we would mix that in on a 
MAX(….) line, net result, the type we use would then directly support 
the needs of Ada.  If vpr wanted 2x of all existing modes, we could 
simply change the MAX equation and essentially double it; if people need 
that.  This comes as a cost, as the intermediate wide values are fixed 
size allocated (not variable); so these all would be larger.


And this cost could be eliminated by having a template wide_int_ so only 
the places that need it actually use the extra size ;-)



On Wed, 3 Oct 2012, Kenneth Zadeck wrote:

i have already converted the vrp code, so i have some guess at where you are 
talking about.  (of course correct me if i am wrong).


in the code that computes the range when two variables are multiplied 
together needs to do a multiplication that produces a result that is twice as 
wide as the inputs.


Yes, exactly.

my library is able to do that with one catch (and this is a big catch): the 
target has to have an integer mode that is twice as big as the mode of the 
operands. The issue is that wide-ints actually carry around the mode of the 
value in order to get the bitsize and precision of the operands (it does not 
have the type, because this code has to both work on the rtl and tree level 
and i generally do not want the signness anyway).


my current code in vrp checks to see if such a mode exists and if it does, it 
produces the product.   if the mode does not exist, it returns bottom.   What 
this means is that for most (many or some) targets that have a TImode, the 
largest thing that particular vrp discover ranges for is a DImode value.   We 
could get around this by defining the next larger mode than what the target 
really needs but i wonder how much mileage you are going to get out of that 
with really large numbers.


This will be for discussion when you submit that next patch, but currently 
VRP handles integers the same size as double_int. In particular, it 
handles __int128. I would be unhappy if introducing a larger bigint type 
in gcc made us regress there.



--
Marc Glisse


Re: Propagate profile counts during switch expansion

2012-10-04 Thread Jan Hubicka
> Hi,
>  This patch propagates the profile counts during RTL expansion. In
> many cases, there is no way to determine the exact count of an edge
> generated during the expansion. So this patch uses some simple
> heuristics to estimate the edge counts but ensures that the counts of
> the basic blocks corresponding to the cases are (nearly the) same as
> at the gimple level.
> 
> Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for 
> trunk?
> Index: gcc/expr.c
> ===
> --- gcc/expr.c (revision 191879)
> +++ gcc/expr.c (working copy)
> @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin
>  #ifdef PUSH_ROUNDING
>  static void emit_single_push_insn (enum machine_mode, rtx, tree);
>  #endif
> -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
> +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
>  static rtx const_vector_from_tree (tree);
>  static void write_complex_part (rtx, rtx, bool);
> 
> @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree
> 
>  static void
>  do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
> -  rtx default_label)
> +  rtx default_label, int default_probability)

Please document default_probability.
>  {
>rtx temp, vector;
> 
> @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r
>   the maximum value of the range.  */
> 
>if (default_label)
> -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
> - default_label);
> +{
> +  emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
> +   default_label);
> +  if (default_probability != -1)
> +{
> +  rtx jump_insn = get_last_insn();
> +  add_reg_note (jump_insn, REG_BR_PROB, GEN_INT 
> (default_probability));
> +}
> +}

dojump already does this kind of logic, but it is bit more cureful:
  emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp,
   if_true_label);
  if (prob != -1 && profile_status != PROFILE_ABSENT)
{ 
  for (last = NEXT_INSN (last);
   last && NEXT_INSN (last);
   last = NEXT_INSN (last))
if (JUMP_P (last))
  break;
  if (last
  && JUMP_P (last)
  && ! NEXT_INSN (last)
  && any_condjump_p (last))
{ 
  gcc_assert (!find_reg_note (last, REG_BR_PROB, 0));
  add_reg_note (last, REG_BR_PROB, GEN_INT (prob));
}
}

What about making emit_cmp_and_jump_insns taking the probability argument
and moving the code above inside?  Most of other places need updating to
propagate probabilities.

(compare_and_jump_seq in loop-unswitch probably also can be updated)
> @@ -10954,7 +10962,7 @@ do_tablejump (rtx index, enum machine_mode mode, r
> 
>  int
>  try_tablejump (tree index_type, tree index_expr, tree minval, tree range,
> -   rtx table_label, rtx default_label)
> +   rtx table_label, rtx default_label, int default_probability)

Simiarly here.
> Index: gcc/cfgbuild.c
> ===
> --- gcc/cfgbuild.c (revision 191879)
> +++ gcc/cfgbuild.c (working copy)
> @@ -533,6 +533,23 @@ find_bb_boundaries (basic_block bb)
>  purge_dead_tablejump_edges (bb, table);
>  }
> 
> +/* If there is at least one edge in EDGES with a non-zero count, then
> +   compute probabilities based on the existing counts.  */
> +
> +static bool
> +gen_probabilities_from_existing_counts ( VEC(edge,gc) *edges) {
> +  edge e;
> +  edge_iterator ei;
> +  gcov_type count_sum = 0;
> +  FOR_EACH_EDGE(e, ei, edges)
> +count_sum += e->count;
> +  if (count_sum == 0)
> +return false;
> +  FOR_EACH_EDGE(e, ei, edges)
> +e->probability = e->count * REG_BR_PROB_BASE / count_sum;
> +  return true;
> +}
> +
>  /*  Assume that frequency of basic block B is known.  Compute frequencies
>  and probabilities of outgoing edges.  */
> 
> @@ -560,7 +577,6 @@ compute_outgoing_frequencies (basic_block b)
>return;
>   }
>  }
> -
>if (single_succ_p (b))
>  {
>e = single_succ_edge (b);
> @@ -568,7 +584,10 @@ compute_outgoing_frequencies (basic_block b)
>e->count = b->count;
>return;
>  }
> -  guess_outgoing_edge_probabilities (b);
> +  else if (!gen_probabilities_from_existing_counts (b->succs)){
> +/* All outgoing edges of B have zero count. Guess probabilities.  */
> +guess_outgoing_edge_probabilities (b);
> +  }

Hmm, I do not quite follow logic here.  
basic block B is one of many basic blocks that the original BB was split from.
It is possible that B may have some of original edges, but there may be new 
ones.
How you can guess the outgoing probabilitie shere.  Do you have an example?

Also gen_probabilities_from_existing_counts could probably also wo

Re: [RFA 1/5] New port: CR16: Remove "gdb" from noconfigdirs in configure.ac

2012-10-04 Thread Pedro Alves
On 10/04/2012 11:17 AM, Kaushik Phatak wrote:
> Hi,
> This patch is one of patch set to add a new port (National Instruments CR16) 
> in gdb.
> This patch will,
> - Remove "gdb" from noconfigdirs in top-level configure.ac.
> - Add target-lobgloss

I can't see how adding target-libgloss is relevant to a gdb port.
Please keep separate logical changes in separate logical patches, each
with its own description/justification.

> - Make target OS independent in config.sub

config.sub is maintained elsewhere.  The right procedure is to send the change
upstream, and then merge upstream into our tree.
Please see the head of the script for instructions.

-- 
Pedro Alves



Re: RFA: darwin PATCH to fix build, internal visibility

2012-10-04 Thread Jack Howarth
On Wed, Oct 03, 2012 at 03:26:14PM -0700, Mike Stump wrote:
> On Oct 3, 2012, at 12:04 PM, Jason Merrill  wrote:
> > This patch fixes a couple of Darwin issues I noticed with a cross-compiler:
> > 
> > 1) Adds a couple of consts to avoid const-correctness errors.
> > 2) Treats visibility "internal" like "hidden" rather than like "default".  
> > The documentation says that internal is hidden + processor-specific 
> > semantics, so treating it as just hidden makes sense to me.
> > 
> > OK for trunk?
> 
> Ok.


FYI, the x86_64-apple-darwin12 testsuite results with the proposed patch are 
at...

http://gcc.gnu.org/ml/gcc-testresults/2012-10/msg00434.html

It appears that the patch should also special case the "scan-assembler 
.internal.*Foo.methodEv"
tests in g++.dg/ext/visibility/pragma-override1.C and 
g++.dg/ext/visibility/pragma-override2.C 
on darwin as well...

FAIL: g++.dg/ext/visibility/pragma-override1.C -std=c++98  scan-assembler 
.internal.*Foo.methodEv
FAIL: g++.dg/ext/visibility/pragma-override1.C -std=c++11  scan-assembler 
.internal.*Foo.methodEv
FAIL: g++.dg/ext/visibility/pragma-override2.C -std=c++98  scan-assembler 
.internal.*Foo.methodEv
FAIL: g++.dg/ext/visibility/pragma-override2.C -std=c++11  scan-assembler 
.internal.*Foo.methodEv

   Jack



Re: patch to fix constant math

2012-10-04 Thread Kenneth Zadeck

Let me talk about the mode here first.

What this interface/patch provides is a facility where the constant math 
that is done in optimizations is done exactly the way that it would be 
done on the target machine.   What we have now is a compiler that only 
does this if it convenient to do on the host.   I admit that i care 
about this more than others right now, but if intel adds a couple of 
more instructions to their vector units, other people will start to 
really care about this issue.   If you take an OImode value with the 
current compiler and left shift it by 250 the middle end will say that 
the result is 0.   This is just wrong!!!


What this means is that the bitsize and precision of the operations need 
to be carried along when doing math. when wide-int  checks for overflow 
on the multiply or add, it is not checking the if the value overflowed 
on two HWIs, it is checking if the add overflowed in the mode of the 
types that are represented on the target.   When we do shift, we are not 
doing a shift within two HWIs, we are truncating the shift value (if 
this is appropriate) according to the bitsize and shifting according the 
precision.


I think that an argument could be made that storing the mode should be 
changed to an explicit precision and bitsize.  (A possible other option 
would be to store a tree type, but this would make the usage at the rtl 
level very cumbersome since types are rare.) Aside from the work, you 
would not get much push back.


But the signess is a different argument.   At the rtl level, the signess 
is a matter of context.   (you could argue that this is a mistake and i 
would agree, but that is an even bigger change.)   But more to the 
point, at the tree level, there are a surprising number of places where 
the operation desired does not follow the sign of the types that were 
used to construct the constants.   Furthermore, not carrying the sign is 
more consistent with the double int code, which as you point out carries 
nothing.


As for the splitting out the patch in smaller pieces, i am all for it.   
I have done this twice already and i could get the const_scalar_int_p 
patch out quickly.But you do not get too far along that before you 
are still left with a big patch.   I could split out wide-int.* and just 
commit those files with no clients as a first step.   My guess is that 
Richard Sandiford would appreciate that because while he has carefully 
checked the rtl stuff, i think that the code inside wide-int is not in 
his comfort zone of things he would approve.


As far as your btw - noticed this last night.   it is an artifact of the 
way i produced the patch and "responsible people have been sacked".   
However, it shows that you read the patch carefully, and i really 
appreciate that.   i owe you a beer (not that you need another at this 
time of year).


Kenny



On 10/04/2012 08:48 AM, Richard Guenther wrote:

On Wed, Oct 3, 2012 at 7:15 PM, Kenneth Zadeck  wrote:

The enclosed patch is the third of at least four patches that fix the
problems associated with supporting integers on the target that are
wider than two HOST_WIDE_INTs.

While GCC claims to support OI mode, and we have two public ports that
make minor use of this mode, in practice, compilation that uses OImode
mode commonly gives the wrong result or ices.  We have a private port
of GCC for an architecture that is further down the road to needing
comprehensive OImode and we have discovered that this is unusable. We
have decided to fix it in a general way that so that it is most
beneficial to the GCC community.  It is our belief that we are just a
little ahead of the X86 and the NEON and these patches will shortly be
essential.

The first two of these patches were primarily lexigraphical and have
already been committed.They transformed the uses of CONST_DOUBLE
so that it is easy to tell what the intended usage is.

The underlying structures in the next two patches are very general:
once they are added to the compiler, the compiler will be able to
support targets with any size of integer from hosts of any size
integer.

The patch enclosed deals with the portable RTL parts of the compiler.
The next patch, which is currently under construction deals with the
tree level.  However, this patch can be put on the trunk as is, and it
will eleviate many, but not all of the current limitations in the rtl
parts of the compiler.

Some of the patch is conditional, depending on a port defining the
symbol 'TARGET_SUPPORTS_WIDE_INT' to be non zero.  Defining this
symbol to be non zero is declaring that the port has been converted to
use the new form or integer constants.  However, the patch is
completely backwards compatible to allow ports that do not need this
immediately to convert at their leasure.  The conversion process is
not difficult, but it does require some knowledge of the port, so we
are not volinteering to do this for all ports.

OVERVIEW OF THE PATCH:

The patch defines a new datatype,

Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-04 Thread Jan Hubicka
Hi,
this patch implements -fprofile-consistency-report that is useful to get an
statistic about what pass are major offenders in keeping profile up-to-date.

For example the following is output for combine.c
 Pass: fnsplit  (after pass) mismatched in:  +16 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: fnsplit  (after TODO) mismatched in:  -16 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: inline   (after pass) mismatched in: +197 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: inline   (after TODO) mismatched in: -209 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: ccp  (after TODO) mismatched in:   +8 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: vrp  (after pass) mismatched in: +191 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: vrp  (after TODO) mismatched in:  +25 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: dce  (after TODO) mismatched in:  -19 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: cdce (after pass) mismatched in:   -1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: cselim   (after pass) mismatched in:   +1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: ifcombine(after TODO) mismatched in:   +1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: phiopt   (after pass) mismatched in:   -2 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: ch   (after pass) mismatched in:   +2 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: ch   (after TODO) mismatched in:   +1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: dom  (after pass) mismatched in:  +89 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: dom  (after TODO) mismatched in:   -3 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: phicprop (after TODO) mismatched in:   -6 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: dce  (after TODO) mismatched in:   -2 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: copyprop (after TODO) mismatched in:  -17 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: unswitch (after pass) mismatched in:   +7 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: unswitch (after TODO) mismatched in:  +19 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: cunroll  (after pass) mismatched in:  +10 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: vrp  (after pass) mismatched in:  +18 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: vrp  (after TODO) mismatched in:   -8 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: dom  (after pass) mismatched in:  +14 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: dom  (after TODO) mismatched in:   +4 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: phicprop (after TODO) mismatched in:   +1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: cddce(after TODO) mismatched in:   -1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: expand   (after pass) mismatched in: +435 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: jump (after pass) mismatched in:   +6 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: cse1 (after pass) mismatched in:   +1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: cprop(after pass) mismatched in:   -8 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: rtl pre  (after pass) mismatched in:   -1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: cse_local(after pass) mismatched in:   -7 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: ce1  (after pass) mismatched in:   +5 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: loop2_init   (after pass) mismatched in:   +1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: loop2_done   (after pass) mismatched in:   -1 (freqs)   +0 
(counts); michmatched out:   +0 (freqs)   +0 (counts)
 Pass: reload 

Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-04 Thread Steven Bosscher
On Thu, Oct 4, 2012 at 4:01 PM, Jan Hubicka wrote:
> * doc/invoke.texi (-fprofile-consistency-report): Document.
> * common.opt (fprofile-consistency-report): New.
> * toplev.h (dump_profile_consistency_report): Declare.
> * toplev.c (finalize): Call dump_profile_consistency_report.
> * passes.c (profile_record): New global var.
> (check_profile_consistency): New function.
> (dump_profile_consistency_report): New function.
> (execute_one_ipa_transform_pass): Call check_profile_consistency.
> (execute_one_pass): Likewise.


Nice. And long overdue! :-)


> +fprofile-consistency-report
> +Common Report Var(profile_report)
> +Report on consistency of profile

Maybe make this a -d flag instead of -f?


> Index: passes.c
> +/* Hold statistic about profile consistency.  */
...

I don't see why this should live in passes.c, can you please put it in
a more logical place (profile.c, perhaps)?

Ciao!
Steven


Re: RFA: add lock_length attribute to break branch-shortening cycles

2012-10-04 Thread Joern Rennecke

Quoting Richard Guenther :


I miss a few things in this description:
- what is the value of lock_length supposed to be?  From the "lock"
  prefix it sounds like it is something unchanging, maybe even constant,
  thus a maximum?
- the length attribute still needs to be specified when lock_length is?
  how do they relate?  Is lock_length always smaller / bigger than length?


I hope I have clarified this in the updated documentation:

 Usually, branch shortening is done assuming the worst case (i.e.
longest) lengths, and then iterating (if optimizing) to smaller lengths
till no further changed occur.  This does not work so well for
architectures that have very small minimum offsets and considerable
jumps in instruction lengths.

 If you define the `lock_length' attribute, branch shortening will work
the other way round: it starts out assuming minimum instruction lengths
and iterates from there.  `lock_length' specifies an instruction length
value that is calculated like `length' in every iteration, but if the
value at the last iteration was larger, that larger previous value will
be used instead.  The value computed for the `length' attribute will be
no smaller than that of the `lock_length' attribute, but you may still
specify a larger value, and in that case `length' can decrease in the
next iteration, but not below the value of `lock_length'.  The easiest
way to make sure branch shortening doesn't start looping indefinitely
is to define the `length' attribute to only a minimum instruction
length for varying length instructions and let the `lock_length'
attribute do all the heavy lifting of computing varying lengths.  On
the other hand, for some instruction sets you might get better
shortening outcomes if you use a more complex `length' attribute and
use `lock_length' only to the extent required to prevent indefinite
looping.  Note that `ADJUST_INSN_LENGTH' applies only to the `length'
attribute.

 Because defining `lock_length' makes branch shortening start with
optimistic assumptions, this means we have to see it through to the end
to eliminate all out-of-range branches, thus branch shortening might
take a bit longer at `-O0' than if you didn't define the `lock_length'
attribute.

 Using `lock_length' with varying delay slots and varying length delay
slot insns (all at once) is currently not supported.


- what happens if you have patterns with lock_length and patterns without?
- what patterns does lock_length apply to?


These questions don't really make literal sense; instruction attributes
are defined for all instructions.  Of course, you can define the default
value to 0, which means you see no effect of this attribute on a pattern
unless some non-default definition applies.


In general optimistically attacking this kind of problem should be always
better - did you try simply switching this for all targets?


No, I haven't.  The boundaries to be set for branch offsets can be different
when starting with optimistic assumptions than when starting with  
pessimistic assumptions.  Moreover, there could be 'interesting'  
interactions

with ADJUST_INSN_LENGTH.  And there is the problem with handling varying
delay slots.  I have modified the final.c patch to reduce the scope of this
issue so that lock_length should hopefully be useful for a wider range of
targets.

Also, ...


 It shouldn't be
slower and the only thing you need to guarantee is that during iteration
you never make insn-lenghts smaller again.


At -O0 it is slower because we can't finish the loop early.

For all these reasons, I think it is better if each target maintainer  
evaluates

if the better branch shortening weighs out the longer -O0 compilation time,
and addresses any issues arising if/when converting.

Bootstrapped on i686-pc-linux-gnu
Index: doc/md.texi
===
--- doc/md.texi (revision 192036)
+++ doc/md.texi (working copy)
@@ -8004,6 +8004,42 @@ (define_insn "jump"
   (const_int 6)))])
 @end smallexample
 
+@cindex lock_length
+Usually, branch shortening is done assuming the worst case (i.e. longest)
+lengths, and then iterating (if optimizing) to smaller lengths till 
+no further changed occur.  This does not work so well for architectures
+that have very small minimum offsets and considerable jumps in instruction
+lengths.
+
+If you define the @code{lock_length} attribute, branch shortening will
+work the other way round: it starts out assuming minimum instruction
+lengths and iterates from there.  @code{lock_length} specifies an instruction
+length value that is calculated like @code{length} in every iteration,
+but if the value at the last iteration was larger, that larger previous
+value will be used instead.
+The value computed for the @code{length} attribute will be no smaller
+than that of the @code{lock_length} attribute, but you may still specify
+a larger value, and in that case @code{length} can decrease in the next
+iteration, but no

Re: [RFC] Make vectorizer to skip loops with small iteration estimate

2012-10-04 Thread Jan Hubicka
> > So SOC cancels out in the runtime check.
> > I still think we need two formulas - one determining if vectorization is
> > profitable, other specifying the threshold for scalar path at runtime (that
> > will generally give lower values).
> 
> True, we want two values.  But part of the scalar path right now
> is all the computation required for alias and alignment runtime checks
> (because the way all the conditions are combined).
> 
> I'm not much into the details of what we account for in SOC (I suppose
> it's everything we insert in the preheader of the vector loop).

Yes, it seems contain everything we insert prior the loop in unfolded form.
> 
> +  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> +fprintf (vect_dump, "not vectorized: estimated iteration count 
> too small.");
> +  if (vect_print_dump_info (REPORT_DETAILS))
> +fprintf (vect_dump, "not vectorized: estimated iteration count 
> smaller than "
> + "user specified loop bound parameter or minimum "
> + "profitable iterations (whichever is more 
> conservative).");
> 
> this won't work anymore btw - dumping infrastructure changed.

Ah, will update that.
> 
> I suppose your patch is a step in the right direction, but to really
> make progress we need to re-organize the loop and predicate structure
> produced by the vectorizer.

This reminds me what I did for string functions on x86. It gets very hard
to get all the paths right when one starts to be really cureful to not
output too much cruft on the short paths + do not consume too many registers.

In fact I want to re-think this for the SSE string ops patch, so I may try to
look into that incrementally.
> 
> So, please update your patch, re-test and then it's ok.

Thanks.
> > I tested enabling loop_ch in early passes with -fprofile-feedback and it is 
> > SPEC
> > neutral.  Given that it improves loop count estimates, I would still like 
> > mainline
> > doing that.  I do not like these quite important estimates to be wrong most 
> > of time.
> 
> I agree.  It also helps getting rid of once rolling loops I think.

I am attaching the patch for early-ch.  Will commit it tomorrow.

Concerning jump threading, it would help to make some of it during early passes
so the profile estiamte do not get invalided.  I tried to move VRP early but 
now it
makes compiler to hang during bootstrap.  I will debug that.
> 
> > > 
> > > Btw, I added a "similar" check in vect_analyze_loop_operations:
> > > 
> > >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >&& (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
> > >   || ((max_niter = max_stmt_executions_int (loop)) != -1
> > >   && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
> > > {
> > >   if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >  "not vectorized: iteration count too small.");
> > >   if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >  "not vectorized: iteration count smaller than "
> > >  "vectorization factor.");
> > >   return false;
> > > }
> > > 
> > > maybe you simply need to update that to also consider the profile?
> > 
> > Hmm, I am still getting familiar wth the code. Later we later have
> >   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >   && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
> > {
> >   if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> > fprintf (vect_dump, "not vectorized: vectorization not "
> >  "profitable.");
> >   if (vect_print_dump_info (REPORT_DETAILS))
> > fprintf (vect_dump, "not vectorized: iteration count smaller than "
> >  "user specified loop bound parameter or minimum "
> >  "profitable iterations (whichever is more conservative).");
> >   return false;
> > }
> > 
> > where th is always greater or equal than vectorization_factor from the cost 
> > model.
> > So this test seems redundant if the max_stmt_executions_int was pushed down
> > to the second conditoinal?
> 
> Yes, sort of.  The new check was supposed to be crystal clear, and
> even with the cost model disabled we want to not vectorize in this
> case.  But yes, the whole cost-model stuff needs TLC.

Ah yes, without cost model we would skip it.  I suppose we do not need to
brother  witht he profile estiamte in the case anyway. They are kind of aprt of
the cost models.

* passes.c (init_optimization_passes): Schedule early CH.
* tree-pass.h (pass_early_ch): Declare it.
* tree-ssa-loop-ch.c (gate_early_ch): New function.
(pass_early_ch): New pass.
Index: passes.c
===
--- passes.c(revision 191852)
+++ passes.c(working c

Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-04 Thread Jan Hubicka
> On Thu, Oct 4, 2012 at 4:01 PM, Jan Hubicka wrote:
> > * doc/invoke.texi (-fprofile-consistency-report): Document.
> > * common.opt (fprofile-consistency-report): New.
> > * toplev.h (dump_profile_consistency_report): Declare.
> > * toplev.c (finalize): Call dump_profile_consistency_report.
> > * passes.c (profile_record): New global var.
> > (check_profile_consistency): New function.
> > (dump_profile_consistency_report): New function.
> > (execute_one_ipa_transform_pass): Call check_profile_consistency.
> > (execute_one_pass): Likewise.
> 
> 
> Nice. And long overdue! :-)
> 
> 
> > +fprofile-consistency-report
> > +Common Report Var(profile_report)
> > +Report on consistency of profile
> 
> Maybe make this a -d flag instead of -f?

time-report and mem-report are also -f, so I guess we shall move all of them or 
none.
> 
> 
> > Index: passes.c
> > +/* Hold statistic about profile consistency.  */
> ...
> 
> I don't see why this should live in passes.c, can you please put it in
> a more logical place (profile.c, perhaps)?

Hmm, I guess predict.c then.
I had it there but then reminded Richard's effort to pull out functions that
are only called from elsehwere and not using anything from given unit ;)

Honza
> 
> Ciao!
> Steven


Re: libgo patch committed: Use libbacktrace

2012-10-04 Thread Ian Lance Taylor
On Thu, Oct 4, 2012 at 5:11 AM, Rainer Orth  
wrote:
> Ian Lance Taylor  writes:
>
>> This patch to libgo changes it to use libbacktrace.  Previously
>> backtraces required the Go package debug/elf to register itself with the
>> runtime during the package initialization, which only worked if the
>> program actually imported debug/elf one way or another.  Bootstrapped
>> and ran Go testsuite on x86_64-unknown-linux-gnu.  Committed to
>> mainline.
>
> Unfortunately, this breaks all use of libgo on versions of Solaris < 11
> which lack strnlen:
>
> Undefined   first referenced
>  symbol in file
> strnlen 
> /var/gcc/regression/trunk/10-gcc/build/i386-
> pc-solaris2.10/libgo/.libs/libgo.so
> ld: fatal: symbol referencing errors. No output written to a.out
> collect2: error: ld returned 1 exit status
> FAIL: bufio
>
> One could either try to also link libiberty into libgo.la, but that has
> the complication of needing to decide whether to use libiberty.a or
> pic/libiberty.a since libiberty is no libtool library.

I guess I won't try to link libgo against libiberty.  I just changed
libbacktrace to provide its own strnlen function, like so.
Bootstrapped and ran libbacktrace testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian


2012-10-04  Ian Lance Taylor  

* dwarf.c: If the system header files do not declare strnlen,
provide our own version.


foo.patch
Description: Binary data


Re: patch to fix

2012-10-04 Thread Kenneth Zadeck

On 10/04/2012 09:17 AM, Marc Glisse wrote:

On Wed, 3 Oct 2012, Mike Stump wrote:

On Oct 3, 2012, at 1:47 PM, Marc Glisse  wrote:
did you consider making the size of wide_int a template parameter, 
now that we are using C++? All with a convenient typedef or macro so 
it doesn't show. I am asking because in vrp I do some arithmetic 
that requires 2*N+1 bits where N is the size of double_int.


No, not really.  I'd maybe answer it this way, we put in a type 
(singular) to support all integral constants in all languages on a 
port. Since we only needed 1, there was little need to templatize 
it.  By supporting all integral constants in all languages, there is 
little need for more.  If Ada say, wanted a 2048 bit integer, then, 
we just have it drop off the size it wants someplace and we would mix 
that in on a MAX(….) line, net result, the type we use would then 
directly support the needs of Ada.  If vpr wanted 2x of all existing 
modes, we could simply change the MAX equation and essentially double 
it; if people need that.  This comes as a cost, as the intermediate 
wide values are fixed size allocated (not variable); so these all 
would be larger.


And this cost could be eliminated by having a template wide_int_ so 
only the places that need it actually use the extra size ;-)


The space is not really an issue in most places since wide-ints tend to 
be short lived.  i guess vrp is slightly different because it creates a 
lot at once.  but then they go away.


However the real question is what are you going to instantiate the 
template on?What we do is look at the target and determine the 
largest type that the target supports and build a wide int type that 
supports that.how are you going to do better?   are you going to 
instantiate one for every type you see?   are these going to be static 
or dynamic?   The last line this email seems to imply that you were 
planning to "know" that __int128 was the largest integer that any target 
or front end could support.


and then what do you do for the parts of the compiler that have 
operations that take things of two different types, like shift. The 
shift amount can and may times is a shorter type that what is being 
shifted. Would these different length integers be represented with 
different instances from the same template?   I am not a c++ programmer 
and so all of this is a little new to me, but given a perspective of the 
rest of the compiler, this does not seem like the right way to go.





On Wed, 3 Oct 2012, Kenneth Zadeck wrote:

i have already converted the vrp code, so i have some guess at where 
you are talking about.  (of course correct me if i am wrong).


in the code that computes the range when two variables are multiplied 
together needs to do a multiplication that produces a result that is 
twice as wide as the inputs.


Yes, exactly.

my library is able to do that with one catch (and this is a big 
catch): the target has to have an integer mode that is twice as big 
as the mode of the operands. The issue is that wide-ints actually 
carry around the mode of the value in order to get the bitsize and 
precision of the operands (it does not have the type, because this 
code has to both work on the rtl and tree level and i generally do 
not want the signness anyway).


my current code in vrp checks to see if such a mode exists and if it 
does, it produces the product.   if the mode does not exist, it 
returns bottom.   What this means is that for most (many or some) 
targets that have a TImode, the largest thing that particular vrp 
discover ranges for is a DImode value.   We could get around this by 
defining the next larger mode than what the target really needs but i 
wonder how much mileage you are going to get out of that with really 
large numbers.


This will be for discussion when you submit that next patch, but 
currently VRP handles integers the same size as double_int. In 
particular, it handles __int128. I would be unhappy if introducing a 
larger bigint type in gcc made us regress there.


You are only happy now because you do not really understand the world 
around you.This is not what your code does.   What you code does is 
that if the host is a 64 bit host you can handle __int128 and if your 
host is a 32 bit host you can handle a __int64.  If you are building a 
cross compiler from a 32 bit host to a 64 bit target, your pass is 
either going to get the wrong answer, give up, or ice.   There are 
currently parts of gcc that do each of these three "solutions" and my 
patch gets rid of these because it does the math as the target does the 
math, no matter that the target is.


The goal of my patch is to make gcc produce the same correct results no 
matter what types the target or host support.The last thing that we 
need to have some optimization "knowing" what the limits of either of 
these are and hard coding that in a set of templates that have been 
statically instantiated.





[C++ testcase] PR 54323

2012-10-04 Thread Paolo Carlini

Hi,

I'm adding the testcase and closing the PR. Tested x86_64-linux.

Thanks,
Paolo.

/
2012-10-04  Paolo Carlini  

PR c++/54323
* g++.dg/cpp0x/pr54323.C: New.
Index: g++.dg/cpp0x/pr54323.C
===
--- g++.dg/cpp0x/pr54323.C  (revision 0)
+++ g++.dg/cpp0x/pr54323.C  (working copy)
@@ -0,0 +1,37 @@
+// PR c++/54323
+// { dg-do compile { target c++11 } }
+
+template
+struct enable_if { };
+
+template
+struct enable_if
+{ typedef T type; };
+
+template class CRTP, typename T>
+class Base
+{
+public:
+  template class CRTP0, typename T0, class>
+  friend int func(const Base& rhs);
+
+protected:
+  int n;
+};
+
+template class CRTP0, typename T0,
+class = typename enable_if::type>
+int func(const Base& rhs)
+{
+  return rhs.n;
+}
+
+template
+class Derived : public Base {};
+
+int main()
+{
+  Derived x;
+  func(x);
+  return 0;
+}


Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

2012-10-04 Thread Vladimir Makarov

On 10/04/2012 03:24 AM, Steven Bosscher wrote:

On Thu, Oct 4, 2012 at 8:57 AM, Steven Bosscher  wrote:

On Thu, Oct 4, 2012 at 5:30 AM, Vladimir Makarov  wrote:

I was going to look at this code too but I was interesting in generation of
less points and live ranges.  It is strange that in my profiles,
remove_some_program_points_and_update_live_ranges takes 0.6% of compiler
time on these huge tests.   So I was not interesting to speed up the
function and may be therefore you have no visible change in compilation
time.

Right. The compression algorithm doesn't care much about the initial
number of program points, only about the number of live ranges before
and after compression. I had expected a bigger effect on the number of
live ranges before compression.

0.6% sounds really very different from my timings. How much time does
create_start_finish_chains take for you?



I don't object the idea of the patch.  I need some time to look at it (the
different results on a function is a bit scary for me) and check simulator
times on other tests.

Understood.

BTW, it would be great if you can also look at this additional patch hunk:

@@ -994,8 +1044,8 @@ lra_create_live_ranges (bool all_p)
curr_point = 0;
point_freq_vec = VEC_alloc (int, heap, get_max_uid () * 2);
lra_point_freq = VEC_address (int, point_freq_vec);
-  FOR_EACH_BB (bb)
-process_bb_lives (bb);
+  FOR_EACH_BB_REVERSE (bb)
+process_bb_lives (bb, curr_point);
lra_live_max_point = curr_point;
create_start_finish_chains ();
if (lra_dump_file != NULL)

I think this should result in more live ranges being merged. Here's
why I think so, based on my far worse understanding of this code than
yours, so forgive me if I'm Completely Wrong :-)

No, you are not wrong.
Two days ago, I worked on patch which contains the same code.  The patch 
actually takes EBB into account to decrease # calls of mark_pseudo_live 
at the beginning of process_bb_lives and mark_pseudo_dead at the 
function end and for that I needed FOR_EACH_BB_REVERSE.  The patch was 
half baked (it did not checked hard regs live changes at the end of BB 
to set up right hard reg conflicts for pseudos) but it gave an idea how 
much I can get from this.  It is not bad but not what I expected.  So I 
stopped work on this.  But we still should work on these ideas as they 
improve LRA speed in small steps (many small steps will create a visible 
effect).


We can really solve scalability problem only by using simpler but still 
good enough algorithms (too simple algorithms result in big code size 
and actually even in worse compilation times).  I've been working on it 
and I'll send a patch soon.

process_bb_lives walks insns in the basic block from last to first, so
say you have a basic block chain 1->2->3, and each block has 4 insns,
then AFAIU the program points in block 1 will be [4,3,2,1], in block 2
it will be [8,7,6,5], and in block 3 it will be [12,11,10,9]. Say a
reg is used in block 3 at point 11, and set in block at point 3. Then
this reg will have a live range chain [3-1],[8-5],[12-11].

If you visit the basic blocks in reverse order, the program points
will be: 1:[12,11,10,9], 2:[8,7,6,5], 3:[4,3,2,1]. Now the same reg
will be set at point 11 and used at point 3, and the live range chain
will be just [11-3].






Re: RFA: darwin PATCH to fix build, internal visibility

2012-10-04 Thread Jason Merrill

On 10/04/2012 09:41 AM, Jack Howarth wrote:

It appears that the patch should also special case the "scan-assembler 
.internal.*Foo.methodEv"
tests in g++.dg/ext/visibility/pragma-override1.C and 
g++.dg/ext/visibility/pragma-override2.C
on darwin as well...


Done, thanks.

Jason




Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

2012-10-04 Thread Vladimir Makarov

On 10/04/2012 05:43 AM, Steven Bosscher wrote:

On Wed, Oct 3, 2012 at 5:35 PM, Steven Bosscher  wrote:

The "worst" result is this:
Compressing live ranges: from 726174 to 64496 - 8%, pre_count 40476128, 
post_count 12483414

But that's still a lot better than before the patch for the same function:
Compressing live ranges: from 1742569 to 73069 - 4%, pre_count 40842330, 
post_count 12479992

Walking basic blocks with FOR_EACH_BB_REVERSE gives:

Only FOR_EACH_BB_REVERSE:
Compressing live ranges: from 1742579 to 429746 - 24% pre_count
41106212, post_count 34376494
Compressing live ranges: from 1742569 to 63000 - 3% pre_count
40835340, post_count 11055747

FOR_EACH_BB_REVERSE + need_curr_point_incr:
Compressing live ranges: from 726184 to 416529 - 57% pre_count
40743516, post_count 34376846
Compressing live ranges: from 726174 to 61840 - 8% pre_count 40472806,
post_count 11055747

The combination of the two changes takes ~20s off the ~180s for "LRA
create live ranges".



Wow.  I did not have such effect.  What machine do you use?


Re: [testsuite] Add support for -fdump-ada-spec

2012-10-04 Thread Mike Stump
On Oct 4, 2012, at 4:10 AM, Eric Botcazou  wrote:
> the recent breakage of -fdump-ada-spec revealed that the switch isn't 
> exercised in the testsuite.  This patch adds the missing support.
> 
> Tested on x86_64-suse-linux, OK for mainline?

Ok.

> What of the 4.7 branch?

I don't see the point…  But if you do…  let the dust settle… ok.


Small PATCH to rs6000.c to fix cross-compiler build without gas

2012-10-04 Thread Jason Merrill
If configure doesn't think that the assembler supports weak symbols, 
rs6000.c fails to compile because ASM_WEAKEN_DECL isn't defined.  So 
let's not use it in that case.


OK for trunk?
commit 5f0878b79d1a42795aca2fabc8d70eefa2e29fa6
Author: Jason Merrill 
Date:   Wed Oct 3 15:55:48 2012 -0400

	* config/rs6000/rs6000.c (rs6000_code_end): Protect the use of
	ASM_WEAKEN_DECL with #if RS6000_WEAK.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 3e3d553..f4e4dec 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -28295,6 +28295,7 @@ rs6000_code_end (void)
   TREE_PUBLIC (decl) = 1;
   TREE_STATIC (decl) = 1;
 
+#if RS6000_WEAK
   if (USE_HIDDEN_LINKONCE)
 {
   DECL_COMDAT_GROUP (decl) = DECL_ASSEMBLER_NAME (decl);
@@ -28307,6 +28308,7 @@ rs6000_code_end (void)
   ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
 }
   else
+#endif
 {
   switch_to_section (text_section);
   ASM_OUTPUT_LABEL (asm_out_file, name);


Re: patch to fix

2012-10-04 Thread Kenneth Zadeck
Actually richi, this code is "correct" for some broken definition of 
correct.


If all that is done is to convert the rtl parts of the compiler, then 
this code is the best you can do (of course an assertion that the length 
is not greater than 2 would be a useful addition).


The code that is in the follow on patch which converts the insides of a 
tree cst to look like a const wide int, i.e. an array of HWIs.   When 
that happens, this code looks completely different. But if you only 
convert the rtl level, at some point there is going to be an impedance 
mismatch and it is buried here.


I will point out that this is the fall out of trying to split things 
into a bunch of smaller patches that could in theory go in separately.


kenny






+/* Constructs tree in type TYPE from with value given by CST.  Signedness
+   of CST is assumed to be the same as the signedness of TYPE.  */
+
+tree
+wide_int_to_tree (tree type, const wide_int &cst)
+{
+  wide_int v;
+  if (TYPE_UNSIGNED (type))
+v = cst.zext (TYPE_PRECISION (type));
+  else
+v = cst.sext (TYPE_PRECISION (type));
+
+  return build_int_cst_wide (type, v.elt (0), v.elt (1));
+}

is surely broken.  A wide-int does not fit a double-int.  How are you
going to "fix" this?

Thanks,
Richard.


kenny




PATCH to acinclude.m4 to fix gas version detection

2012-10-04 Thread Jason Merrill
Recent versions of binutils seem to have started putting ' around the 
version number in bfd/configure.in, which was confusing gcc configure. 
This patch allows us to detect the version number again.


OK for trunk?
commit f9ce75775fe4392ee92893c46e89e17dc31bb816
Author: Jason Merrill 
Date:   Thu Oct 4 00:55:33 2012 -0400

	* acinclude.m4 (_gcc_COMPUTE_GAS_VERSION): Handle ' around version
	number.
	* configure: Regenerate.

diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
index c24464b..d5eb4da 100644
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -393,7 +393,7 @@ for f in $gcc_cv_as_bfd_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure.in \
  $gcc_cv_as_gas_srcdir/Makefile.in ; do
-  gcc_cv_gas_version=`sed -n -e 's/^[[ 	]]*\(VERSION=[[0-9]]*\.[[0-9]]*.*\)/\1/p' < $f`
+  gcc_cv_gas_version=`sed -n -e "s/^[[ 	]]*VERSION='*\([[0-9]]*\.[[0-9]]*[^']*\)'*/VERSION=\1/p" < $f`
   if test x$gcc_cv_gas_version != x; then
 break
   fi
diff --git a/gcc/configure b/gcc/configure
index 45bba8e..2d71f7d 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -21237,7 +21237,7 @@ for f in $gcc_cv_as_bfd_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure.in \
  $gcc_cv_as_gas_srcdir/Makefile.in ; do
-  gcc_cv_gas_version=`sed -n -e 's/^[ 	]*\(VERSION=[0-9]*\.[0-9]*.*\)/\1/p' < $f`
+  gcc_cv_gas_version=`sed -n -e "s/^[[ 	]]*VERSION='*\([[0-9]]*\.[[0-9]]*[^']*\)'*/VERSION=\1/p" < $f`
   if test x$gcc_cv_gas_version != x; then
 break
   fi


Profile housekeeping 7/n minor updating fixes

2012-10-04 Thread Jan Hubicka
Hi,
here are few minor issues I run across while auditing the profile updating code.
The function.c changes solve some misupdates in jump threading.
tree-inline and except can be seen with -fprofile-feedback.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* tree-inline.c (tree_function_versioning): Set counts of split 
functions.
* function.c (dup_block_and_redirect): Update the whole path through
epilogue.
(convert_jumps_to_returns): Update profile after redirection.
* cfgexpand.c (expand_gimple_tailcall): Initialize new edge.
* except.c (dw2_build_landing_pads, sjlj_emit_dispatch_table,
sjlj_emit_dispatch_table): Initialize profile of newly created
basic blocks
Index: tree-inline.c
===
*** tree-inline.c   (revision 192080)
--- tree-inline.c   (working copy)
*** tree_function_versioning (tree old_decl,
*** 5251,5256 
--- 5251,5258 
if (new_entry)
  {
struct cgraph_edge *e;
+   ENTRY_BLOCK_PTR->count = new_entry->count;
+   EXIT_BLOCK_PTR->count = new_entry->count;
rebuild_frequencies ();
  
new_version_node->count = ENTRY_BLOCK_PTR->count;
Index: function.c
===
*** function.c  (revision 192080)
--- function.c  (working copy)
*** dup_block_and_redirect (basic_block bb,
*** 5670,5683 
{
int freq = EDGE_FREQUENCY (e);
copy_bb->count += e->count;
!   copy_bb->frequency += EDGE_FREQUENCY (e);
!   e->dest->count -= e->count;
!   if (e->dest->count < 0)
! e->dest->count = 0;
!   e->dest->frequency -= freq;
!   if (e->dest->frequency < 0)
! e->dest->frequency = 0;
!   redirect_edge_and_branch_force (e, copy_bb);
continue;
}
  else
--- 5670,5690 
{
int freq = EDGE_FREQUENCY (e);
copy_bb->count += e->count;
!   copy_bb->frequency += freq;
!   while (bb != EXIT_BLOCK_PTR)
! {
!   edge se;
!   bb->count -= e->count;
!   if (bb->count < 0)
! bb->count = 0;
!   bb->frequency -= freq;
!   if (bb->frequency < 0)
! bb->frequency = 0;
!   se = single_succ_edge (bb);
!   se->count -= e->count;
!   bb = se->dest;
! }
! redirect_edge_and_branch_force (e, copy_bb);
continue;
}
  else
*** convert_jumps_to_returns (basic_block la
*** 5797,5802 
--- 5804,5815 
}
  
/* Fix up the CFG for the successful change we just made.  */
+   e->dest->count -= e->count;
+   if (e->dest->count < 0)
+   e->dest->count = 0;
+   e->dest->frequency -= EDGE_FREQUENCY (e);
+   if (e->dest->frequency < 0)
+   e->dest->frequency = 0;
redirect_edge_succ (e, EXIT_BLOCK_PTR);
e->flags &= ~EDGE_CROSSING;
  }
*** epilogue_done:
*** 6581,6587 
  emit_barrier_after (start);
  
  *pdest_bb = bb;
! make_edge (bb, EXIT_BLOCK_PTR, 0);
}
  redirect_edge_and_branch_force (e, *pdest_bb);
}
--- 6599,6607 
  emit_barrier_after (start);
  
  *pdest_bb = bb;
! se = make_edge (bb, EXIT_BLOCK_PTR, 0);
! se->count = e->count;;
! se->probability = REG_BR_PROB_BASE;
}
  redirect_edge_and_branch_force (e, *pdest_bb);
}
Index: cfgexpand.c
===
*** cfgexpand.c (revision 192080)
--- cfgexpand.c (working copy)
*** expand_gimple_tailcall (basic_block bb,
*** 2313,2320 
  }
  
e = make_edge (bb, EXIT_BLOCK_PTR, EDGE_ABNORMAL | EDGE_SIBCALL);
!   e->probability += probability;
!   e->count += count;
BB_END (bb) = last;
update_bb_for_insn (bb);
  
--- 2313,2320 
  }
  
e = make_edge (bb, EXIT_BLOCK_PTR, EDGE_ABNORMAL | EDGE_SIBCALL);
!   e->probability = probability;
!   e->count = count;
BB_END (bb) = last;
update_bb_for_insn (bb);
  
Index: except.c
===
*** except.c(revision 192080)
--- except.c(working copy)
*** dw2_build_landing_pads (void)
*** 988,993 
--- 988,995 
  
bb = emit_to_new_bb_before (seq, label_rtx (lp->post_landing_pad));
e = make_edge (bb, bb->next_bb, e_flags);
+   bb->count = bb->next_bb->count;
+   bb->frequency = bb->next_bb->frequency;
e->count = bb->count;
e->probability = REG_BR_PROB_BASE;
if (current_loops)
*** sjlj_emit_dispatch_table (rtx dispatch_l
*** 1332,1337 
--- 1334,1341 
  
before = label_rtx (lp->post_landing_pad);
bb = emit_to_new_bb_before (seq2, before);
+ bb->c

[lra] patch to solve most scalability problems for LRA

2012-10-04 Thread Vladimir Makarov

  The following patch solves most of LRA scalability problems.

  Itswitches on simpler algorithms in LRA.  The first it switches off
trying to reassign hard registers to spilled pseudos (they usually for such
huge functions have long live ranges -- so the possibility to assign
them something very small but trying to reassign them a hard registers
is to expensive), inheritance, live range splitting, and memory
coalescing optimizations.  It seems that rematerialization is too
important for performance -- so I don't switch it off.  As splitting is
also necessary for generation of caller saves code, I switch off
caller-saves in IRA and force IRA to do non-regional RA.

  Here are the results on the huge tests in question.  The testing was
done on Corei7-2600 with 16GB memory to exclude swapping and make cpu
time close to real time (on 8GB LRA in real time worked already faster
reload when swapping occurs because better code/data locality).

  In the following table, time means cpu time of all GCC, size is text
segment size of generated code (data and bss segments are all the same
for given test), RA% means % of all cpu compiler time spent in RA
(IRA+reload or IRA+LRA) taken from -ftime-report (it is approximate
because it is sum of approximate numbers in which some of them are 0%
although they are not exactly 0%).

 The first row of the table is for the current IRA and reload, the 2nd
row is for current IRA+LRA on the branch, the 3rd row is the current
IRA+LRA with the patch.

  Because of small space I put data only for -m32 for the first 2
tests (-m64 has similar results), the 3rd test is for 64 bits because
it can not be correctly compiled for 32 bits.


time/size/RA% PR26854 PR37448PR54146
reload   565.15s  2102964  15%   293.47s  3122140  3% 349.93s  
6556630  19%
lra  624.18s  1707580  22%   311.76s  3221620  9% 469.30s  
6277934  40%
patched lra  524.51s  1720620  8%287.83s  3002372  1% 399.32s  
6395351  30%


  IRA+patched LRA behaves better than IRA+reload with compilation time
and code size point of view.

  Interesting enough, that for PR37448 regular algorithms results in bigger
code size that simpler ones.  I guess, it is because of live-range
splitting.  It can be profitable but has a tendency to generate a
bigger code.

  The only issue now is PR54146 compilation time for IRA+LRA although it
was improved significantly.  I will continue work on PR54146.  But now I
am going to focus on proposals from reviews.

  The patch was successfully bootstrapped on x86/x86-64.  I did it
twice, when simpler algorithms are always switched on (by setting
threshold #pseudos * #basic blocks very small) and when simpler
algorithms are used for huge functions (I believe there are no such
functions in GCC).

Committed as rev. 192086.

2012-10-04  Vladimir Makarov  

* lra.h (lra_simple_p): New external.
* lra.c (lra_simple_p): New global var.
(lra): Switch off inheritance and coalescing if lra_simple_p.
* lra-assigns.c (assign_by_spills): Don't try to reassign spilled
pseduos if lra_simple_p.
* ira.c (ira): Set up lra_simple_p and ira_conflicts_p.  Set up
and restore flag_caller_saves and flag_ira_region.

Index: ira.c
===
--- ira.c	(revision 192048)
+++ ira.c	(working copy)
@@ -4327,8 +4327,26 @@ ira (FILE *f)
   bool loops_p;
   int max_regno_before_ira, ira_max_point_before_emit;
   int rebuild_p;
+  bool saved_flag_caller_saves = flag_caller_saves;
+  enum ira_region saved_flag_ira_region = flag_ira_region;
+
+  ira_conflicts_p = optimize > 0;
 
   ira_use_lra_p = targetm.lra_p ();
+  /* If there are too many pseudos and/or basic blocks (e.g. 10K
+ pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
+ use simplified and faster algorithms in LRA.  */
+  lra_simple_p
+= (ira_use_lra_p && max_reg_num () >= (1 << 26) / last_basic_block);
+  if (lra_simple_p)
+{
+  /* It permits to skip live range splitting in LRA.  */
+  flag_caller_saves = false;
+  /* There is no sense to do regional allocation when we use
+	 simplified LRA.  */
+  flag_ira_region = IRA_REGION_ONE;
+  ira_conflicts_p = false;
+}
 
 #ifndef IRA_NO_OBSTACK
   gcc_obstack_init (&ira_obstack);
@@ -4349,7 +4367,6 @@ ira (FILE *f)
   ira_dump_file = stderr;
 }
 
-  ira_conflicts_p = optimize > 0;
   setup_prohibited_mode_move_regs ();
 
   df_note_add_problem ();
@@ -4530,6 +4547,13 @@ ira (FILE *f)
   /* See comment for find_moveable_pseudos call.  */
   if (ira_conflicts_p)
 move_unallocated_pseudos ();
+
+  /* Restore original values.  */
+  if (lra_simple_p)
+{
+  flag_caller_saves = saved_flag_caller_saves;
+  flag_ira_region = saved_flag_ira_region;
+}
 }
 
 static void
Index: lra-assigns.c
===
--- lra-assigns.c	(revision 192050)
+++ lra-assigns.c	(wo

Re: RFC: LRA for x86/x86-64 [7/9] -- continuation

2012-10-04 Thread Richard Sandiford
Hi Vlad,

This message is for lra-assigns.c.  Sorry for the piecemeal reviews,
never sure when I'll get time...

> +/* This file contains a pass mostly assigning hard registers to reload
> +   pseudos.  There is no any RTL code transformation on this pass.

Maybe:

/* This file's main objective is to assign hard registers to reload pseudos.
   It also tries to allocate hard registers to other pseudos, but at a lower
   priority than the reload pseudos.  The pass does not transform the RTL.

if that's accurate.

> +   Reload pseudos get what they need (usually) hard registers in
> +   anyway possibly by spilling non-reload pseudos and by assignment
> +   reload pseudos with smallest number of available hard registers
> +   first.
> +
> +   If reload pseudos can get hard registers only through spilling
> +   other pseudos, we choose what pseudos to spill taking into account
> +   how given reload pseudo benefits and also how other reload pseudos
> +   not assigned yet benefit too (see function spill_for).

Maybe:

   We must allocate a hard register to every reload pseudo.  We try to
   increase the chances of finding a viable allocation by assigning the
   pseudos in order of fewest available hard registers first.  If we
   still fail to find a hard register, we spill other (non-reload)
   pseudos in order to make room.

   assign_hard_regno_for allocates registers without spilling.
   spill_for does the same with spilling.  Both functions use
   a cost model to determine the most profitable choice of
   hard and spill registers.

> +   Non-reload pseudos can get hard registers too if it is possible and
> +   improves the code.  It might be possible because of spilling
> +   non-reload pseudos on given pass.

Maybe:

   Once we have finished allocating reload pseudos, we also try to
   assign registers to other (non-reload) pseudos.  This is useful
   if hard registers were freed up by the spilling just described.

> +   We try to assign hard registers processing pseudos by threads.  The
> +   thread contains reload and inheritance pseudos connected by copies
> +   (move insns).  It improves the chance to get the same hard register
> +   to pseudos in the thread and, as the result, to remove some move
> +   insns.

Maybe:

   We try to assign hard registers by collecting pseudos into threads.
   These threads contain reload and inheritance pseudos that are connected
   by copies (move insns).  Doing this improves the chances of pseudos
   in the thread getting the same hard register and, as a result,
   of allowing some move insns to be deleted.

> +   When we assign hard register to a pseudo, we decrease the cost of
> +   the hard registers for corresponding pseudos connected by copies.

Maybe:

   When we assign a hard register to a pseudo, we decrease the cost of
   using the same hard register for pseudos that are connected by copies.

> +   If two hard registers are equally good for assigning the pseudo
> +   with hard register cost point of view, we prefer a hard register in
> +   smaller register bank.  By default, there is only one register
> +   bank.  A target can define register banks by hook
> +   register_bank. For example, x86-64 has a few register banks: hard
> +   regs with and without REX prefixes are in different banks.  It
> +   permits to generate smaller code as insns without REX prefix are
> +   shorter.

Maybe:

   If two hard registers have the same frequency-derived cost,
   we prefer hard registers in lower register banks.  The mapping
   of registers to banks is controlled by the register_bank target hook.
   For example, x86-64 has a few register banks: hard registers with and
   without REX prefixes are in different banks.  This permits us
   to generate smaller code as insns without REX prefixes are shorter.

although this might change if the name of the hook changes.

> +/* Info about pseudo used during the assignment pass.  Thread is a set
> +   of connected reload and inheritance pseudos with the same set of
> +   available hard reg set.  Thread is a pseudo itself for other
> +   cases.  */
> +struct regno_assign_info

Maybe:

/* Information about the thread to which a pseudo belongs.  Threads are
   a set of connected reload and inheritance pseudos with the same set of
   available hard registers.  Lone registers belong to their own threads.  */

Although the condition seems to be:

> + && (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
> + == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))

i.e. the same _number_ of available hard regs, but not necessarily the
same set.

"thread" might be more mnemonic than "regno_assign" in this file,
but that's bikeshed stuff.

> +  for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
> +{
> +  regno_assign_info[i].first = i;
> +  regno_assign_info[i].next = -1;
> +  regno_assign_info[i].freq = lra_reg_info[i].freq;
> +}

Minor speedup, but it's probably worth caching max_reg_n

Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-10-04 Thread Mike Stump
On Oct 4, 2012, at 6:06 AM, Iain Buclaw  wrote:
> I would like to get a bump on this.
> 
> It's been a while, and there have been quite a number of changes since
> the initial post that address many of the issues raised.  Rather than
> reposting patches, someone mentioned attaching changelog, well, here
> it is.
> 
> Repository is still located here: https://github.com/D-Programming-GDC/GDC
> 
> Would it be possible to have a re-newed review?

You don't ask, you post.  If you have independent patches to the rest of gcc 
that improve it, I'd suggest posting those as separate patches and getting 
those in.  This give you a base onto which to slot in the front-end.


Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

2012-10-04 Thread Vladimir Makarov

On 10/04/2012 02:57 AM, Steven Bosscher wrote:

On Thu, Oct 4, 2012 at 5:30 AM, Vladimir Makarov  wrote:

I was going to look at this code too but I was interesting in generation of
less points and live ranges.  It is strange that in my profiles,
remove_some_program_points_and_update_live_ranges takes 0.6% of compiler
time on these huge tests.   So I was not interesting to speed up the
function and may be therefore you have no visible change in compilation
time.

Right. The compression algorithm doesn't care much about the initial
number of program points, only about the number of live ranges before
and after compression. I had expected a bigger effect on the number of
live ranges before compression.

0.6% sounds really very different from my timings. How much time does
create_start_finish_chains take for you?


0.65% (2.78s).

Actually, I have a profile but I am not sure now that it is for 
PR54146.  It might be for PR26854.


I'll check it again to be sure.


Re: PATCH trunk: gengtype honoring mark_hook-s inside struct inside union-s

2012-10-04 Thread Laurynas Biveinis
> 2012-10-03  Basile Starynkevitch  
>
> * gengtype.c (walk_type): Emit mark_hook when inside a
>   struct of a union member.

This is OK.

Thanks,
-- 
Laurynas


Re: Small PATCH to rs6000.c to fix cross-compiler build without gas

2012-10-04 Thread David Edelsohn
On Thu, Oct 4, 2012 at 11:38 AM, Jason Merrill  wrote:
> If configure doesn't think that the assembler supports weak symbols,
> rs6000.c fails to compile because ASM_WEAKEN_DECL isn't defined.  So let's
> not use it in that case.
>
> OK for trunk?

This patch is okay.

Where are you compiling GCC for PowerPC that configure thinks the
assembler does not support weak symbols?

Thanks, David


PING^2: [patch] pr/54508: fix incomplete debug information for class

2012-10-04 Thread Paul_Koning
Ping ^ 2...

--

If the only reference in a source file is to a static method of a class, then 
GCC would output debug information for the class name but not any of its 
members or base classes.  The attached patch fixes this by having 
"prune_unused_types_mark" mark all of the parent's children if the parent DIE 
type is for a class.

The associated new testcase verifies this, and also verifies this that 
references to a function in a namespace do *not* cause other parts of that 
namespace to be emitted as debug information, but that references to a method 
in a class (or struct or union) do emit the other information for that class.

Checked by "make check" on dwarf2.exp.

Ok to commit?  This would close PR/54508.

paul

ChangeLog:

2012-09-17  Paul Koning  

* dwarf2out.c (prune_unused_types_mark): Mark all of parent's
children if parent is a class.

testsuite/ChangeLog:

2012-09-17  Paul Koning  

* g++.dg/debug/dwarf2/pr54508.C: New.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 191408)
+++ gcc/dwarf2out.c (working copy)
@@ -21033,9 +21033,13 @@
  prune_unused_types_mark_generic_parms_dies (die);

  /* We also have to mark its parents as used.
-(But we don't want to mark our parents' kids due to this.)  */
+(But we don't want to mark our parent's kids due to this,
+unless it is a class.)  */
  if (die->die_parent)
-   prune_unused_types_mark (die->die_parent, 0);
+   prune_unused_types_mark (die->die_parent, 
+(die->die_parent->die_tag == DW_TAG_class_type 
||
+ die->die_parent->die_tag == 
DW_TAG_structure_type ||
+ die->die_parent->die_tag == 
DW_TAG_union_type));

  /* Mark any referenced nodes.  */
  prune_unused_types_walk_attribs (die);
Index: testsuite/g++.dg/debug/dwarf2/pr54508.C
===
--- testsuite/g++.dg/debug/dwarf2/pr54508.C (revision 0)
+++ testsuite/g++.dg/debug/dwarf2/pr54508.C (revision 0)
@@ -0,0 +1,67 @@
+// PR debug/54508
+// { dg-do compile }
+// { dg-options "-g2 -dA" }
+
+// { dg-final { scan-assembler "\"cbase0\"\[ \t\]+\[#;/!|@\]+ 
DW_AT_name\|DW_AT_name: \"cbase\"" } }
+// { dg-final { scan-assembler "\"OPCODE0\"\[ \t\]+\[#;/!|@\]+ 
DW_AT_name\|DW_AT_name: \"OPCODE\"" } }
+// { dg-final { scan-assembler "\"bi0\"\[ \t\]+\[#;/!|@\]+ DW_AT_name" } }
+// { dg-final { scan-assembler "\"si0\"\[ \t\]+\[#;/!|@\]+ DW_AT_name" } }
+// { dg-final { scan-assembler "\"f10\"\[ \t\]+\[#;/!|@\]+ DW_AT_name" } }
+// { dg-final { scan-assembler "\"f20\"\[ \t\]+\[#;/!|@\]+ DW_AT_name" } }
+// { dg-final { scan-assembler-not "\"nc0\"\[ \t\]+\# 
DW_AT_name\|DW_AT_name: \"nc\"" } }
+
+class cbase
+
+{
+public:
+ static int si;
+int bi;
+};
+
+class c : public cbase
+
+{
+public:
+ enum
+ {
+  OPCODE = 251
+ };
+ int i ;
+ static const char *testc (void) { return "foo"; }
+};
+
+struct s
+{
+int f1;
+static const char *tests (void) { return "test"; }
+};
+
+union u
+{
+int f2;
+double d;
+static const char *testu (void) { return "test union"; }
+};
+
+namespace n 
+{
+const char *ntest (void) { return "test n"; }
+
+class nc
+{
+public:
+int i;
+static int sj;
+};
+}
+
+extern void send (int, int, const void *, int);
+
+void test (int src)
+{
+  int cookie = 1;
+  send(src, c::OPCODE, c::testc (), cookie);
+  send(src, c::OPCODE, s::tests (), cookie);
+  send(src, c::OPCODE, u::testu (), cookie);
+  send(src, c::OPCODE, n::ntest (), cookie);
+}



Re: [patch][lra] Improve initial program point density in lra-lives.c

2012-10-04 Thread Vladimir Makarov

On 10/04/2012 11:45 AM, Vladimir Makarov wrote:

On 10/04/2012 02:57 AM, Steven Bosscher wrote:
On Thu, Oct 4, 2012 at 5:30 AM, Vladimir Makarov 
 wrote:
I was going to look at this code too but I was interesting in 
generation of

less points and live ranges.  It is strange that in my profiles,
remove_some_program_points_and_update_live_ranges takes 0.6% of 
compiler

time on these huge tests.   So I was not interesting to speed up the
function and may be therefore you have no visible change in compilation
time.

Right. The compression algorithm doesn't care much about the initial
number of program points, only about the number of live ranges before
and after compression. I had expected a bigger effect on the number of
live ranges before compression.

0.6% sounds really very different from my timings. How much time does
create_start_finish_chains take for you?


0.65% (2.78s).

Actually, I have a profile but I am not sure now that it is for 
PR54146.  It might be for PR26854.


I'll check it again to be sure.

Not it looks about the same.


Re: [RFC] Make vectorizer to skip loops with small iteration estimate

2012-10-04 Thread Richard Guenther
On Thu, 4 Oct 2012, Jan Hubicka wrote:

> > > So SOC cancels out in the runtime check.
> > > I still think we need two formulas - one determining if vectorization is
> > > profitable, other specifying the threshold for scalar path at runtime 
> > > (that
> > > will generally give lower values).
> > 
> > True, we want two values.  But part of the scalar path right now
> > is all the computation required for alias and alignment runtime checks
> > (because the way all the conditions are combined).
> > 
> > I'm not much into the details of what we account for in SOC (I suppose
> > it's everything we insert in the preheader of the vector loop).
> 
> Yes, it seems contain everything we insert prior the loop in unfolded form.
> > 
> > +  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> > +fprintf (vect_dump, "not vectorized: estimated iteration count 
> > too small.");
> > +  if (vect_print_dump_info (REPORT_DETAILS))
> > +fprintf (vect_dump, "not vectorized: estimated iteration count 
> > smaller than "
> > + "user specified loop bound parameter or minimum "
> > + "profitable iterations (whichever is more 
> > conservative).");
> > 
> > this won't work anymore btw - dumping infrastructure changed.
> 
> Ah, will update that.
> > 
> > I suppose your patch is a step in the right direction, but to really
> > make progress we need to re-organize the loop and predicate structure
> > produced by the vectorizer.
> 
> This reminds me what I did for string functions on x86. It gets very hard
> to get all the paths right when one starts to be really cureful to not
> output too much cruft on the short paths + do not consume too many registers.
> 
> In fact I want to re-think this for the SSE string ops patch, so I may try to
> look into that incrementally.
> > 
> > So, please update your patch, re-test and then it's ok.
> 
> Thanks.
> > > I tested enabling loop_ch in early passes with -fprofile-feedback and it 
> > > is SPEC
> > > neutral.  Given that it improves loop count estimates, I would still like 
> > > mainline
> > > doing that.  I do not like these quite important estimates to be wrong 
> > > most of time.
> > 
> > I agree.  It also helps getting rid of once rolling loops I think.
> 
> I am attaching the patch for early-ch.  Will commit it tomorrow.
> 
> Concerning jump threading, it would help to make some of it during early 
> passes
> so the profile estiamte do not get invalided.  I tried to move VRP early but 
> now it
> makes compiler to hang during bootstrap.  I will debug that.
> > 
> > > > 
> > > > Btw, I added a "similar" check in vect_analyze_loop_operations:
> > > > 
> > > >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >&& (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
> > > >   || ((max_niter = max_stmt_executions_int (loop)) != -1
> > > >   && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
> > > > {
> > > >   if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > >  "not vectorized: iteration count too small.");
> > > >   if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > >  "not vectorized: iteration count smaller than "
> > > >  "vectorization factor.");
> > > >   return false;
> > > > }
> > > > 
> > > > maybe you simply need to update that to also consider the profile?
> > > 
> > > Hmm, I am still getting familiar wth the code. Later we later have
> > >   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >   && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
> > > {
> > >   if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> > > fprintf (vect_dump, "not vectorized: vectorization not "
> > >  "profitable.");
> > >   if (vect_print_dump_info (REPORT_DETAILS))
> > > fprintf (vect_dump, "not vectorized: iteration count smaller than 
> > > "
> > >  "user specified loop bound parameter or minimum "
> > >  "profitable iterations (whichever is more 
> > > conservative).");
> > >   return false;
> > > }
> > > 
> > > where th is always greater or equal than vectorization_factor from the 
> > > cost model.
> > > So this test seems redundant if the max_stmt_executions_int was pushed 
> > > down
> > > to the second conditoinal?
> > 
> > Yes, sort of.  The new check was supposed to be crystal clear, and
> > even with the cost model disabled we want to not vectorize in this
> > case.  But yes, the whole cost-model stuff needs TLC.
> 
> Ah yes, without cost model we would skip it.  I suppose we do not need to
> brother  witht he profile estiamte in the case anyway. They are kind of aprt 
> of
> the cost models.
> 
>   * passes.c (init_optimization_passes): Schedule early CH.
> 

Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-10-04 Thread Iain Buclaw
On 4 October 2012 16:51, Mike Stump  wrote:
> On Oct 4, 2012, at 6:06 AM, Iain Buclaw  wrote:
>> I would like to get a bump on this.
>>
>> It's been a while, and there have been quite a number of changes since
>> the initial post that address many of the issues raised.  Rather than
>> reposting patches, someone mentioned attaching changelog, well, here
>> it is.
>>
>> Repository is still located here: https://github.com/D-Programming-GDC/GDC
>>
>> Would it be possible to have a re-newed review?
>
> You don't ask, you post.  If you have independent patches to the rest of gcc 
> that improve it, I'd suggest posting those as separate patches and getting 
> those in.  This give you a base onto which to slot in the front-end.

Hi Mike,

The only patches to gcc proper are documentation-related and adding
the D frontend / libphobos to configure and make files.  I would have
thought that these would typically only be included with the actual
front-end?


Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #2c

2012-10-04 Thread Gunther Nikl
Michael Meissner schrieb:
> On Tue, Oct 02, 2012 at 10:13:25AM +0200, Gunther Nikl wrote:
>> Michael Meissner wrote:
>>> Segher Boessenkool asked me on IRC to break out the fix in the last change.
>>> This patch is just the change to set the default options if the user did not
>>> use -mcpu= and the compiler was not configured with --with-cpu=.
>>> Here are the patches.
>> Which GCC releases are affected by this bug?
> 
> All of them.

So this bug is as old as the rs6000 port has PowerPC support? Then GCC
2.95 is also affected?

> Now, in general users don't see this bug, because distribution maintainers
> usually build with an explicit --with-cpu= option, which sets the default
> CPU in case the user did not use -mcpu= on the command line.  If neither
> option was used, the default "powerpc" or "powerpc64" is usually good enough.

I am not a distribution user. I have a private PPC port which I always
build without an explicit --with-cpu= option. This option seemed to be
redundant with PROCESSOR_DEFAULT and TARGET_DEFAULT in the target
config file. I will alter my build procedure.

Regards,
Gunther



[PATCH] Improve var-tracking memory disambiguation with frame pointer (PR debug/54796)

2012-10-04 Thread Jakub Jelinek
Hi!

This patch fixes a few FAILs in the ix86 guality testsuite (mainly -Os),
by better disambiguating sp based VALUEs (which usually have no MEM_EXPR
and thus the alias Oracle can't be used for them) from frame pointer
based ones or global vars.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-10-04  Jakub Jelinek  

PR debug/54796
* rtl.h: Document jump flag on VALUE.
* cselib.h (cselib_set_value_sp_based,
cselib_sp_based_value_p): New prototypes.
* alias.c (find_base_term): For cselib_sp_based_value_p
return static_reg_base_value[STACK_POINTER_REGNUM].
* cselib.c (SP_BASED_VALUE_P): Define.
(cselib_set_value_sp_based, cselib_sp_based_value_p): New functions.
* var-tracking.c (add_stores): Call cselib_set_value_sp_based
for not yet preserved VALUEs of sp on sp assignments if
hard_frame_pointer_adjustment != -1.
(vt_initialize): When setting hard_frame_pointer_adjustment,
disassociate sp from its previous value and call
cselib_set_value_sp_based on a new VALUE created for sp.

* gcc.dg/guality/pr54796.c: New test.

--- gcc/rtl.h.jj2012-09-28 14:16:57.0 +0200
+++ gcc/rtl.h   2012-09-28 14:16:57.0 +0200
@@ -267,7 +267,8 @@ struct GTY((chain_next ("RTX_NEXT (&%h)"
  1 in a CALL_INSN if it is a sibling call.
  1 in a SET that is for a return.
  In a CODE_LABEL, part of the two-bit alternate entry field.
- 1 in a CONCAT is VAL_EXPR_IS_COPIED in var-tracking.c.  */
+ 1 in a CONCAT is VAL_EXPR_IS_COPIED in var-tracking.c.
+ 1 in a VALUE is SP_BASED_VALUE_P in cselib.c.  */
   unsigned int jump : 1;
   /* In a CODE_LABEL, part of the two-bit alternate entry field.
  1 in a MEM if it cannot trap.
--- gcc/cselib.h.jj 2012-03-06 17:02:12.0 +0100
+++ gcc/cselib.h2012-10-04 08:42:04.879335722 +0200
@@ -99,6 +99,8 @@ extern void cselib_preserve_only_values
 extern void cselib_preserve_cfa_base_value (cselib_val *, unsigned int);
 extern void cselib_add_permanent_equiv (cselib_val *, rtx, rtx);
 extern bool cselib_have_permanent_equivalences (void);
+extern void cselib_set_value_sp_based (cselib_val *);
+extern bool cselib_sp_based_value_p (cselib_val *);
 
 extern void dump_cselib_table (FILE *);
 
--- gcc/alias.c.jj  2012-09-12 10:57:03.0 +0200
+++ gcc/alias.c 2012-10-04 08:45:22.893221350 +0200
@@ -1641,6 +1641,9 @@ find_base_term (rtx x)
   if (!val)
return ret;
 
+  if (cselib_sp_based_value_p (val))
+   return static_reg_base_value[STACK_POINTER_REGNUM];
+
   f = val->locs;
   /* Temporarily reset val->locs to avoid infinite recursion.  */
   val->locs = NULL;
--- gcc/cselib.c.jj 2012-08-24 23:44:03.0 +0200
+++ gcc/cselib.c2012-10-04 08:45:12.722278664 +0200
@@ -210,6 +210,9 @@ void (*cselib_record_sets_hook) (rtx ins
 #define PRESERVED_VALUE_P(RTX) \
   (RTL_FLAG_CHECK1("PRESERVED_VALUE_P", (RTX), VALUE)->unchanging)
 
+#define SP_BASED_VALUE_P(RTX) \
+  (RTL_FLAG_CHECK1("SP_BASED_VALUE_P", (RTX), VALUE)->jump)
+
 
 
 /* Allocate a struct elt_list and fill in its two elements with the
@@ -739,6 +742,23 @@ cselib_preserve_only_values (void)
   gcc_assert (first_containing_mem == &dummy_val);
 }
 
+/* Arrange for a value to be marked as based on stack pointer
+   for find_base_term purposes.  */
+
+void
+cselib_set_value_sp_based (cselib_val *v)
+{
+  SP_BASED_VALUE_P (v->val_rtx) = 1;
+}
+
+/* Test whether a value is preserved.  */
+
+bool
+cselib_sp_based_value_p (cselib_val *v)
+{
+  return SP_BASED_VALUE_P (v->val_rtx);
+}
+
 /* Return the mode in which a register was last set.  If X is not a
register, return its mode.  If the mode in which the register was
set is not known, or the value was already clobbered, return
--- gcc/var-tracking.c.jj   2012-10-03 09:01:36.297902370 +0200
+++ gcc/var-tracking.c  2012-10-04 15:35:57.276697832 +0200
@@ -5769,6 +5769,11 @@ add_stores (rtx loc, const_rtx expr, voi
 
   resolve = preserve = !cselib_preserved_value_p (v);
 
+  if (loc == stack_pointer_rtx
+  && hard_frame_pointer_adjustment != -1
+  && preserve)
+cselib_set_value_sp_based (v);
+
   nloc = replace_expr_with_values (oloc);
   if (nloc)
 oloc = nloc;
@@ -9866,6 +9871,19 @@ vt_initialize (void)
{
  vt_init_cfa_base ();
  hard_frame_pointer_adjustment = fp_cfa_offset;
+ /* Disassociate sp from fp now.  */
+ if (MAY_HAVE_DEBUG_INSNS)
+   {
+ cselib_val *v;
+ cselib_invalidate_rtx (stack_pointer_rtx);
+ v = cselib_lookup (stack_pointer_rtx, Pmode, 1,
+VOIDmode);
+ if (v && !cselib_preserved_value_p (v))
+   {
+ 

[PATCH] Teach VRP to handle if ((unsigned_narrowing_cast) x != 0) similarly to if ((x & 0xffff) != 0) (PR tree-optimization/54810)

2012-10-04 Thread Jakub Jelinek
Hi!

This patch handles unsigned narrowing casts the same as
BIT_AND_EXPR with the unsigned narrow type's max value.

Bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2012-10-04  Jakub Jelinek  

PR tree-optimization/54810
* tree-vrp.c (register_edge_assert_for_2): Handle
NAME = (unsigned) NAME2; if (NAME cmp CST) for
narrowing casts to unsigned integral type like
NAME = NAME2 & CST2; if (NAME cmp CST) where CST2
is the max value of the unsigned integral type.

--- gcc/tree-vrp.c.jj   2012-09-25 14:45:48.0 +0200
+++ gcc/tree-vrp.c  2012-10-04 11:43:32.334988401 +0200
@@ -4712,6 +4712,11 @@ register_edge_assert_for_2 (tree name, e
   tree val2 = NULL_TREE;
   double_int mask = double_int_zero;
   unsigned int prec = TYPE_PRECISION (TREE_TYPE (val));
+  unsigned int nprec = prec;
+  enum tree_code rhs_code = ERROR_MARK;
+
+  if (is_gimple_assign (def_stmt))
+   rhs_code = gimple_assign_rhs_code (def_stmt);
 
   /* Add asserts for NAME cmp CST and NAME being defined
 as NAME = (int) NAME2.  */
@@ -4721,7 +4726,7 @@ register_edge_assert_for_2 (tree name, e
  && gimple_assign_cast_p (def_stmt))
{
  name2 = gimple_assign_rhs1 (def_stmt);
- if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))
+ if (CONVERT_EXPR_CODE_P (rhs_code)
  && INTEGRAL_TYPE_P (TREE_TYPE (name2))
  && TYPE_UNSIGNED (TREE_TYPE (name2))
  && prec == TYPE_PRECISION (TREE_TYPE (name2))
@@ -4767,8 +4772,7 @@ register_edge_assert_for_2 (tree name, e
 NAME = NAME2 >> CST2.
 
 Extract CST2 from the right shift.  */
-  if (is_gimple_assign (def_stmt)
- && gimple_assign_rhs_code (def_stmt) == RSHIFT_EXPR)
+  if (rhs_code == RSHIFT_EXPR)
{
  name2 = gimple_assign_rhs1 (def_stmt);
  cst2 = gimple_assign_rhs2 (def_stmt);
@@ -4840,21 +4844,37 @@ register_edge_assert_for_2 (tree name, e
   /* Add asserts for NAME cmp CST and NAME being defined as
 NAME = NAME2 & CST2.
 
-Extract CST2 from the and.  */
+Extract CST2 from the and.
+
+Also handle
+NAME = (unsigned) NAME2;
+casts where NAME's type is unsigned and has smaller precision
+than NAME2's type as if it was NAME = NAME2 & MASK.  */
   names[0] = NULL_TREE;
   names[1] = NULL_TREE;
   cst2 = NULL_TREE;
-  if (is_gimple_assign (def_stmt)
- && gimple_assign_rhs_code (def_stmt) == BIT_AND_EXPR)
+  if (rhs_code == BIT_AND_EXPR
+ || (CONVERT_EXPR_CODE_P (rhs_code)
+ && TREE_CODE (TREE_TYPE (val)) == INTEGER_TYPE
+ && TYPE_UNSIGNED (TREE_TYPE (val))
+ && TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (def_stmt)))
+> prec
+ && !retval))
{
  name2 = gimple_assign_rhs1 (def_stmt);
- cst2 = gimple_assign_rhs2 (def_stmt);
+ if (rhs_code == BIT_AND_EXPR)
+   cst2 = gimple_assign_rhs2 (def_stmt);
+ else
+   {
+ cst2 = TYPE_MAX_VALUE (TREE_TYPE (val));
+ nprec = TYPE_PRECISION (TREE_TYPE (name2));
+   }
  if (TREE_CODE (name2) == SSA_NAME
  && INTEGRAL_TYPE_P (TREE_TYPE (name2))
  && TREE_CODE (cst2) == INTEGER_CST
  && !integer_zerop (cst2)
- && prec <= HOST_BITS_PER_DOUBLE_INT
- && (prec > 1
+ && nprec <= HOST_BITS_PER_DOUBLE_INT
+ && (nprec > 1
  || TYPE_UNSIGNED (TREE_TYPE (val
{
  gimple def_stmt2 = SSA_NAME_DEF_STMT (name2);
@@ -4881,12 +4901,12 @@ register_edge_assert_for_2 (tree name, e
  bool valid_p = false, valn = false, cst2n = false;
  enum tree_code ccode = comp_code;
 
- valv = tree_to_double_int (val).zext (prec);
- cst2v = tree_to_double_int (cst2).zext (prec);
+ valv = tree_to_double_int (val).zext (nprec);
+ cst2v = tree_to_double_int (cst2).zext (nprec);
  if (!TYPE_UNSIGNED (TREE_TYPE (val)))
{
- valn = valv.sext (prec).is_negative ();
- cst2n = cst2v.sext (prec).is_negative ();
+ valn = valv.sext (nprec).is_negative ();
+ cst2n = cst2v.sext (nprec).is_negative ();
}
  /* If CST2 doesn't have most significant bit set,
 but VAL is negative, we have comparison like
@@ -4894,7 +4914,7 @@ register_edge_assert_for_2 (tree name, e
  if (!cst2n && valn)
ccode = ERROR_MARK;
  if (cst2n)
-   sgnbit = double_int_one.llshift (prec - 1, prec).zext (prec);
+   sgnbit = double_int_one.llshift (nprec - 1, nprec).zext (nprec);
  else
sgnbit = double_int_zero;
  minv = valv & cst2v;
@@ -4906,12 +4926,12 @@ register_edge_assert_for_2 (tree name, e
 h

Re: [PATCH] Improve var-tracking memory disambiguation with frame pointer (PR debug/54796)

2012-10-04 Thread Steven Bosscher
On Thu, Oct 4, 2012 at 6:33 PM, Jakub Jelinek wrote:
> This patch fixes a few FAILs in the ix86 guality testsuite (mainly -Os),
> by better disambiguating sp based VALUEs (which usually have no MEM_EXPR
> and thus the alias Oracle can't be used for them) from frame pointer
> based ones or global vars.

Hi,

Does this also help for some of the var-tracking compile-time-hog
bugs? ISTR PR53958 was one with very long chains of sp based values...

Ciao!
Steven


Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

2012-10-04 Thread Steven Bosscher
On Thu, Oct 4, 2012 at 5:31 PM, Vladimir Makarov  wrote:
>
> Wow.  I did not have such effect.  What machine do you use?

I do all my testing on gcc17.

Ciao!
Steven


Re: patch to fix

2012-10-04 Thread Marc Glisse

On Thu, 4 Oct 2012, Kenneth Zadeck wrote:


On 10/04/2012 09:17 AM, Marc Glisse wrote:

On Wed, 3 Oct 2012, Mike Stump wrote:

On Oct 3, 2012, at 1:47 PM, Marc Glisse  wrote:
did you consider making the size of wide_int a template parameter, now 
that we are using C++? All with a convenient typedef or macro so it 
doesn't show. I am asking because in vrp I do some arithmetic that 
requires 2*N+1 bits where N is the size of double_int.


No, not really.  I'd maybe answer it this way, we put in a type (singular) 
to support all integral constants in all languages on a port. Since we 
only needed 1, there was little need to templatize it.  By supporting all 
integral constants in all languages, there is little need for more.  If 
Ada say, wanted a 2048 bit integer, then, we just have it drop off the 
size it wants someplace and we would mix that in on a MAX(….) line, net 
result, the type we use would then directly support the needs of Ada.  If 
vpr wanted 2x of all existing modes, we could simply change the MAX 
equation and essentially double it; if people need that.  This comes as a 
cost, as the intermediate wide values are fixed size allocated (not 
variable); so these all would be larger.


And this cost could be eliminated by having a template wide_int_ so only 
the places that need it actually use the extra size ;-)


The space is not really an issue in most places since wide-ints tend to be 
short lived.


You were the one talking of a cost.

However the real question is what are you going to instantiate the template 
on?What we do is look at the target and determine the largest type that 
the target supports and build a wide int type that supports that.how are 
you going to do better?


In a single place in tree-vrp.c in the code that evaluates 
multiplications, I would instantiate the template on the double (possibly 
+1) of the value you selected as large enough for all constants. For all 
the rest, your type is fine.


This will be for discussion when you submit that next patch, but currently 
VRP handles integers the same size as double_int. In particular, it handles 
__int128. I would be unhappy if introducing a larger bigint type in gcc 
made us regress there.


You are only happy now because you do not really understand the world around 
you.


I did not want to go into details, but let me re-phrase: I do not want to 
regress. Currently, hosts with a 64 bit hwi can handle VRP multiplications 
on __int128. If your patch introducing better big integers breaks that, 
that sounds bad to me, since I would expect s/double_int/wide_int/ to just 
work, and using wide_int<2*MAX> would just be a potential simplification 
of the code for later.



Note that VRP is just the one case I am familiar with. Using templates 
should (I haven't checked) be completely trivial and help the next person 
who needs bigger integers for a specific purpose and doesn't want to 
penalize the whole compiler. If the size of wide_int is completely 
irrelevant and we can make it 10 times larger without thinking, I guess 
some numbers showing it would be great (or maybe that's common 
knowledge, then I guess it is fine).



Now those are only some comments from an occasional contributor, not 
reviewer requirements, it is fine to ignore them.


--
Marc Glisse


Re: [patch][lra] Improve initial program point density in lra-lives.c

2012-10-04 Thread Steven Bosscher
On Thu, Oct 4, 2012 at 6:12 PM, Vladimir Makarov  wrote:
>>> 0.6% sounds really very different from my timings. How much time does
>>> create_start_finish_chains take for you?
>>>
>> 0.65% (2.78s).
>>
>> Actually, I have a profile but I am not sure now that it is for PR54146.
>> It might be for PR26854.
>>
>> I'll check it again to be sure.
>
> Not it looks about the same.

Well, that's very strange. Maybe we measure these things differently?
I just hi-hack a timevar, so I measure e.g. the time spent in
create_start_finish_chains like so:

Index: lra-lives.c
===
--- lra-lives.c (revision 192052)
+++ lra-lives.c (working copy)
@@ -770,6 +812,7 @@ create_start_finish_chains (void)
   int i, max_regno;
   lra_live_range_t r;

+timevar_push (TV_CPROP);
   lra_start_point_ranges = XCNEWVEC (lra_live_range_t, lra_live_max_point);
   lra_finish_point_ranges = XCNEWVEC (lra_live_range_t, lra_live_max_point);
   max_regno = max_reg_num ();
@@ -783,6 +826,7 @@ create_start_finish_chains (void)
  lra_finish_point_ranges[r->finish] = r;
}
 }
+timevar_pop (TV_CPROP);
 }

 /* Rebuild LRA_START_POINT_RANGES and LRA_FINISH_POINT_RANGES after


so that I get the timings in the -ftime-report like so:

 CPROP   :  43.14 ( 4%) usr
 integrated RA   : 200.81 (17%) usr
 LRA non-specific:  62.18 ( 5%) usr
 LRA virtuals elimination:  61.71 ( 5%) usr
 LRA reload inheritance  :   6.41 ( 1%) usr
 LRA create live ranges  :  139.75 (13%) usr
 LRA hard reg assignment : 130.90 (11%) usr
 LRA coalesce pseudo regs:   2.45 ( 0%) usr
 reload  :   9.09 ( 1%) usr

"Crude, but efficient" (tm) :-)

How do you measure the time spent in that function, and in
remove_some_program_points_and_update_live_ranges?

Ciao!
Steven


Re: patch to fix constant math

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 3:55 PM, Kenneth Zadeck  wrote:
> Let me talk about the mode here first.
>
> What this interface/patch provides is a facility where the constant math
> that is done in optimizations is done exactly the way that it would be done
> on the target machine.   What we have now is a compiler that only does this
> if it convenient to do on the host.   I admit that i care about this more
> than others right now, but if intel adds a couple of more instructions to
> their vector units, other people will start to really care about this issue.
> If you take an OImode value with the current compiler and left shift it by
> 250 the middle end will say that the result is 0.   This is just wrong!!!
>
> What this means is that the bitsize and precision of the operations need to
> be carried along when doing math. when wide-int  checks for overflow on the
> multiply or add, it is not checking the if the value overflowed on two HWIs,
> it is checking if the add overflowed in the mode of the types that are
> represented on the target.   When we do shift, we are not doing a shift
> within two HWIs, we are truncating the shift value (if this is appropriate)
> according to the bitsize and shifting according the precision.
>
> I think that an argument could be made that storing the mode should be
> changed to an explicit precision and bitsize.  (A possible other option
> would be to store a tree type, but this would make the usage at the rtl
> level very cumbersome since types are rare.) Aside from the work, you would
> not get much push back.
>
> But the signess is a different argument.   At the rtl level, the signess is
> a matter of context.   (you could argue that this is a mistake and i would
> agree, but that is an even bigger change.)   But more to the point, at the
> tree level, there are a surprising number of places where the operation
> desired does not follow the sign of the types that were used to construct
> the constants.   Furthermore, not carrying the sign is more consistent with
> the double int code, which as you point out carries nothing.

Well, on RTL the signedness is on the operation (you have sdiv and udiv, etc.).

double-int tries to present a sign-less twos-complement entity of size
2 * HOST_BITS_PER_WIDE_INT.  I think that is sensible and for
obvious reasons should not change.  Both tree and RTL rely on this.
What we do not want is that up to TImode you get an internal representation
done one way (twos-complement) and on OImode and larger you
suddenly get subtly different behavior.  That's a recepie for desaster.

I'd like to clean up the interface to double-int some more (now with the
nice C++ stuff we have).  double-int should be pure twos-complement,
there should be no operations on double-ints that behave differently
when done signed or unsigned, instead we have signed and unsigned
versions of the operations (similar to how signedness is handled on
the RTL level).  With some trivial C++ fu you could have a
double_sint and double_uint type that would get rid of the bool
sign params we have to some functions (and then you could
write double_sint >> n using operator notation).

I'd like wide-int (whatever it's internal representation is) to behave
exactly like double-ints with respect to precision and signedness
handling.  Ideally all static functions we have that operate on
double-ints would be 1:1 available for wide-ints, so I can change
the type of entities in an algorithm from double-ints to wide-ints
(or vice versa) and do not have to change the code at all.

Thus as first step I'd like you to go over the double-int stuff,
compare it to the wide-int stuff you introduce and point out
differences (changing double-ints or wide-ints to whatever is
the more general concept).

Now, as for 'modes' - similar to signedness some functions
that operate on double-ints take a precision argument (like
the various extensions).  You can add a similar wrapper
type like double_sint, but this time with a cost - a new precision
member, that can be constructed from a double_int (or wide_int)
that ends up specifying the desired precision (be it in terms
of a mode or a type).

You didn't question my suggestion to have the number of
HOST_WIDE_INTs in a wide-int be compile-time constant - was
that just an oversight on your side?  The consequence is that
code wanting to deal with arbitrary length wide-ints needs to
be a template.

> As for the splitting out the patch in smaller pieces, i am all for it.   I
> have done this twice already and i could get the const_scalar_int_p patch
> out quickly.But you do not get too far along that before you are still
> left with a big patch.   I could split out wide-int.* and just commit those
> files with no clients as a first step.   My guess is that Richard Sandiford
> would appreciate that because while he has carefully checked the rtl stuff,
> i think that the code inside wide-int is not in his comfort zone of things
> he would approve.
>
> As far as your bt

Adjust gcc.dg/lto/20120723_0.c for SPARC

2012-10-04 Thread Eric Botcazou
As noted by Martin in http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00116.html, 
the testcase is invalid C and cannot pass on SPARC 32-bit because of the ABI.

Tested on SPARC/Solaris 10, applied on the mainline.


2012-10-04  Eric Botcazou  

* gcc.dg/lto/20120723_0.c: Skip on SPARC 32-bit.


-- 
Eric BotcazouIndex: gcc.dg/lto/20120723_0.c
===
--- gcc.dg/lto/20120723_0.c	(revision 192073)
+++ gcc.dg/lto/20120723_0.c	(working copy)
@@ -1,7 +1,9 @@
 /* Make sure that by reference and by value aggregate jump functions do not get
-   mixed up.  */
+   mixed up.
+   ??? This testcase is invalid C and can only pass on specific platforms.  */
 /* { dg-lto-do run } */
-/* { dg-lto-options {{-O3 -fno-early-inlining -flto}} } */
+/* { dg-skip-if "" { { sparc*-*-* } && ilp32 } { "*" } { "" } } */
+/* { dg-lto-options { {-O3 -fno-early-inlining -flto}} } */
 
 extern void abort (void);
 


[Patch, Fortran] Fix some memory leaks

2012-10-04 Thread Tobias Burnus
This patch fixes some memory leaks and other issues found by 
http://scan5.coverity.com.


Build and regtested on x86-64-linux.
OK for the trunk?

Tobias

2012-10-04  Tobias Burnus  

	* expr.c (scalarize_intrinsic_call): Plug memory leak.
	* frontend-passes.c (gcc_assert): Extend assert.
	* interface.c (gfc_compare_derived_types): Fix comparison.
	(gfc_check_operator_interface): Move up to make this error
	message reachable.
	(get_sym_storage_size): Remove always-true checks.
	* io.c (format_lex): Add comment.
	(gfc_free_wait): Free memory.
	* match.c (gfc_match_select_type): Ditto. 
	* matchexpr.c (match_level_3): Ditto.
	* primary.c (match_string_constant): Ditto.
	(match_actual_arg): Check return value.
	* resolve.c (gfc_resolve_substring_charlen,
	resolve_typebound_generic_call, resolve_typebound_function,
	resolve_typebound_subroutine): Free memory.
	* trans-types.c (gfc_get_derived_type): Remove always-true check.

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 4bba438..9ac0fc6 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -2059,6 +2059,8 @@ scalarize_intrinsic_call (gfc_expr *e)
 
   free_expr0 (e);
   *e = *expr;
+  /* Free "expr" but not the pointers it contains.  */
+  free (expr);
   gfc_free_expr (old);
   return SUCCESS;
 
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index 437ed7e..0cba911 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -1177,7 +1177,7 @@ optimize_trim (gfc_expr *e)
   /* Set the end of the reference to the call to len_trim.  */
 
   ref->u.ss.end = fcn;
-  gcc_assert (*rr == NULL);
+  gcc_assert (rr != NULL && *rr == NULL);
   *rr = ref;
   return true;
 }
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 6bcd607..fb3da1f 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -449,7 +449,7 @@ gfc_compare_derived_types (gfc_symbol *derived1, gfc_symbol *derived2)
   /* Make sure that link lists do not put this function into an 
 	 endless recursive loop!  */
   if (!(dt1->ts.type == BT_DERIVED && derived1 == dt1->ts.u.derived)
-	&& !(dt1->ts.type == BT_DERIVED && derived1 == dt1->ts.u.derived)
+	&& !(dt2->ts.type == BT_DERIVED && derived2 == dt2->ts.u.derived)
 	&& gfc_compare_types (&dt1->ts, &dt2->ts) == 0)
 	return 0;
 
@@ -641,8 +641,12 @@ gfc_check_operator_interface (gfc_symbol *sym, gfc_intrinsic_op op,
 && op != INTRINSIC_NOT)
   || (args == 2 && op == INTRINSIC_NOT))
 {
-  gfc_error ("Operator interface at %L has the wrong number of arguments",
-		 &sym->declared_at);
+  if (op == INTRINSIC_ASSIGN)
+	gfc_error ("Assignment operator interface at %L must have "
+		   "two arguments", &sym->declared_at);
+  else
+	gfc_error ("Operator interface at %L has the wrong number of arguments",
+		   &sym->declared_at);
   return false;
 }
 
@@ -656,12 +660,6 @@ gfc_check_operator_interface (gfc_symbol *sym, gfc_intrinsic_op op,
 		 "a SUBROUTINE", &sym->declared_at);
 	  return false;
 	}
-  if (args != 2)
-	{
-	  gfc_error ("Assignment operator interface at %L must have "
-		 "two arguments", &sym->declared_at);
-	  return false;
-	}
 
   /* Allowed are (per F2003, 12.3.2.1.2 Defined assignments):
 	 - First argument an array with different rank than second,
@@ -2149,7 +2147,7 @@ get_sym_storage_size (gfc_symbol *sym)
 return 0;
   for (i = 0; i < sym->as->rank; i++)
 {
-  if (!sym->as || sym->as->upper[i]->expr_type != EXPR_CONSTANT
+  if (sym->as->upper[i]->expr_type != EXPR_CONSTANT
 	  || sym->as->lower[i]->expr_type != EXPR_CONSTANT)
 	return 0;
 
@@ -2224,9 +,7 @@ get_expr_storage_size (gfc_expr *e)
 	  continue;
 	}
 
-  if (ref->type == REF_ARRAY && ref->u.ar.type == AR_SECTION
-	  && ref->u.ar.start && ref->u.ar.end && ref->u.ar.stride
-	  && ref->u.ar.as->upper)
+  if (ref->type == REF_ARRAY && ref->u.ar.type == AR_SECTION)
 	for (i = 0; i < ref->u.ar.dimen; i++)
 	  {
 	long int start, end, stride;
diff --git a/gcc/fortran/io.c b/gcc/fortran/io.c
index 428799c..447d03f 100644
--- a/gcc/fortran/io.c
+++ b/gcc/fortran/io.c
@@ -243,6 +243,8 @@ format_lex (void)
 {
 case '-':
   negative_flag = 1;
+  /* Falls through.  */
+
 case '+':
   c = next_char_not_space (&error);
   if (!ISDIGIT (c))
@@ -4117,6 +4119,7 @@ gfc_free_wait (gfc_wait *wait)
   gfc_free_expr (wait->iostat);
   gfc_free_expr (wait->iomsg);
   gfc_free_expr (wait->id);
+  free (wait);
 }
 
 
diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index d46a495..06585af 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -5325,6 +5325,7 @@ gfc_match_select_type (void)
   char name[GFC_MAX_SYMBOL_LEN];
   bool class_array;
   gfc_symbol *sym;
+  gfc_namespace *parent_ns;
 
   m = gfc_match_label ();
   if (m == MATCH_ERROR)
@@ -5404,7 +5405,9 @@ gfc_match_select_type (void)
   return MATCH_YES;
   
 cleanup:
-  gfc_current_ns = gfc_current_ns->paren

Re: [lra] patch to solve most scalability problems for LRA

2012-10-04 Thread Steven Bosscher
On Thu, Oct 4, 2012 at 5:37 PM, Vladimir Makarov  wrote:
>   The only issue now is PR54146 compilation time for IRA+LRA although it
> was improved significantly.  I will continue work on PR54146.  But now I
> am going to focus on proposals from reviews.

Right, there still are opportunities to improve things.

(The real solution may be to stop SRA from creating so many
simultaneously live pseudos in the first place...)

> +  lra_simple_p
> += (ira_use_lra_p && max_reg_num () >= (1 << 26) / last_basic_block);

I think you should use n_basic_blocks here instead of
last_basic_block, in case this runs without compacting the cfg first
(n_basic_blocks is the real number of basic blocks in the cfg,
last_basic_block is the highest index, so last_basic_block >=
n_basic_blocks).

Thanks for working on this!

Ciao!
Steven


Re: [PATCH] Improve var-tracking memory disambiguation with frame pointer (PR debug/54796)

2012-10-04 Thread Jakub Jelinek
On Thu, Oct 04, 2012 at 06:42:47PM +0200, Steven Bosscher wrote:
> On Thu, Oct 4, 2012 at 6:33 PM, Jakub Jelinek wrote:
> > This patch fixes a few FAILs in the ix86 guality testsuite (mainly -Os),
> > by better disambiguating sp based VALUEs (which usually have no MEM_EXPR
> > and thus the alias Oracle can't be used for them) from frame pointer
> > based ones or global vars.
> 
> Does this also help for some of the var-tracking compile-time-hog
> bugs? ISTR PR53958 was one with very long chains of sp based values...

Very unlikely.  That PR has been reported against 4.7, which doesn't have
clobber_overlapping_mems at all, the patch could only affect compile times
if there were huge chains of sp based values and find_base_term appeared
anywhere significantly in the profiles.
That PR seems to create so huge VTA hash tables that var-tracking gives up,
unfortunately the current give up heuristics doesn't estimate the compile
time and compile memory needed accurately enough and lowering the limit
(which is a param)'s default would on the other side result in sane sized
programs to trigger no-VTA way too often.

Jakub


Re: [patch][lra] Improve initial program point density in lra-lives.c

2012-10-04 Thread Steven Bosscher
On Thu, Oct 4, 2012 at 6:56 PM, Steven Bosscher  wrote:
> "Crude, but efficient" (tm) :-)

BTW with a similar approach I also time other bits of process_bb_lives:

timevar_push (TV_HOIST);
  /* See if we'll need an increment at the end of this basic block.
 An increment is needed if the PSEUDOS_LIVE set is not empty,
 to make sure the finish points are set up correctly.  */
  need_curr_point_incr = (sparseset_cardinality (pseudos_live) > 0);
  EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
mark_pseudo_dead (i, curr_point);
timevar_pop (TV_HOIST);
timevar_push (TV_PRE);
  EXECUTE_IF_SET_IN_SPARSESET (pseudos_live_through_calls, i)
if (bitmap_bit_p (DF_LR_IN (bb), i))
  check_pseudos_live_through_calls (i);
timevar_pop (TV_PRE);

 PRE :  12.20 ( 1%) usr
 code hoisting   :  34.03 ( 3%) usr

IOW that's ~46s out of ~180s *not* spent in walking the insns in
process_bb_lives!

I think this is due to the opening/closing of chains for all pseudos
live through the basic block at the start and end of process_bb_lives.
For this test case the cardinality of pseudos_live is as high as
max_reg_num/2 at its peak.

I tried to keep pseudos_live open for bb->prev_bb if there is a
find_edge(bb->prev_bb,bb), but that's when I ran into a problem with
inconsistent liveness data (that's what this message was about:
http://gcc.gnu.org/ml/gcc/2012-10/msg00035.html).

Ciao!
Steven


Re: [Patch, Fortran] Fix some memory leaks

2012-10-04 Thread Steven Bosscher
On Thu, Oct 4, 2012 at 7:06 PM, Tobias Burnus wrote:
> This patch fixes some memory leaks and other issues found by
> http://scan5.coverity.com.
>
> Build and regtested on x86-64-linux.
> OK for the trunk?

Yes, thanks for plugging these!
Some of them have been there since day 0 :-)

Ciao!
Steven


Re: PATCH trunk: gengtype honoring mark_hook-s inside struct inside union-s

2012-10-04 Thread Basile Starynkevitch
On Thu, Oct 04, 2012 at 06:51:35PM +0300, Laurynas Biveinis wrote:
> > 2012-10-03  Basile Starynkevitch  
> >
> > * gengtype.c (walk_type): Emit mark_hook when inside a
> >   struct of a union member.
> 
> This is OK.

thanks, Committed revision 192092 to trunk.


I believe this patch should be backported into GCC 4.7 and 4.6

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: [lra] patch to solve most scalability problems for LRA

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 7:07 PM, Steven Bosscher  wrote:
> On Thu, Oct 4, 2012 at 5:37 PM, Vladimir Makarov  wrote:
>>   The only issue now is PR54146 compilation time for IRA+LRA although it
>> was improved significantly.  I will continue work on PR54146.  But now I
>> am going to focus on proposals from reviews.
>
> Right, there still are opportunities to improve things.
>
> (The real solution may be to stop SRA from creating so many
> simultaneously live pseudos in the first place...)
>
>> +  lra_simple_p
>> += (ira_use_lra_p && max_reg_num () >= (1 << 26) / last_basic_block);
>
> I think you should use n_basic_blocks here instead of
> last_basic_block, in case this runs without compacting the cfg first
> (n_basic_blocks is the real number of basic blocks in the cfg,
> last_basic_block is the highest index, so last_basic_block >=
> n_basic_blocks).

I also noticed that switching to IRA_REGION_ONE improves things
when we have a large number of loops (profile points to some
loop code in IRA).  Note that the magic number above should
be a new --param, and once we have a diagnostic flag that
shows whenever we back off like this it should notify the
user of that fact (and the params we have overflown) - this
just reminded me of that idea from somebody else ;)

> Thanks for working on this!

Indeed ;)

It, btw, also applies to IRA + reload ...

Richard.

> Ciao!
> Steven


Re: PATCH trunk: gengtype honoring mark_hook-s inside struct inside union-s

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 7:24 PM, Basile Starynkevitch
 wrote:
> On Thu, Oct 04, 2012 at 06:51:35PM +0300, Laurynas Biveinis wrote:
>> > 2012-10-03  Basile Starynkevitch  
>> >
>> > * gengtype.c (walk_type): Emit mark_hook when inside a
>> >   struct of a union member.
>>
>> This is OK.
>
> thanks, Committed revision 192092 to trunk.
>
>
> I believe this patch should be backported into GCC 4.7 and 4.6

I see no reason for this unless it is a regression.

Richard.

> Regards.
> --
> Basile STARYNKEVITCH http://starynkevitch.net/Basile/
> email: basilestarynkevitchnet mobile: +33 6 8501 2359
> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
> *** opinions {are only mines, sont seulement les miennes} ***


Re: [PATCH] Teach VRP to handle if ((unsigned_narrowing_cast) x != 0) similarly to if ((x & 0xffff) != 0) (PR tree-optimization/54810)

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 6:31 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch handles unsigned narrowing casts the same as
> BIT_AND_EXPR with the unsigned narrow type's max value.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk?

Ok.

Thanks,
Richard.

> 2012-10-04  Jakub Jelinek  
>
> PR tree-optimization/54810
> * tree-vrp.c (register_edge_assert_for_2): Handle
> NAME = (unsigned) NAME2; if (NAME cmp CST) for
> narrowing casts to unsigned integral type like
> NAME = NAME2 & CST2; if (NAME cmp CST) where CST2
> is the max value of the unsigned integral type.
>
> --- gcc/tree-vrp.c.jj   2012-09-25 14:45:48.0 +0200
> +++ gcc/tree-vrp.c  2012-10-04 11:43:32.334988401 +0200
> @@ -4712,6 +4712,11 @@ register_edge_assert_for_2 (tree name, e
>tree val2 = NULL_TREE;
>double_int mask = double_int_zero;
>unsigned int prec = TYPE_PRECISION (TREE_TYPE (val));
> +  unsigned int nprec = prec;
> +  enum tree_code rhs_code = ERROR_MARK;
> +
> +  if (is_gimple_assign (def_stmt))
> +   rhs_code = gimple_assign_rhs_code (def_stmt);
>
>/* Add asserts for NAME cmp CST and NAME being defined
>  as NAME = (int) NAME2.  */
> @@ -4721,7 +4726,7 @@ register_edge_assert_for_2 (tree name, e
>   && gimple_assign_cast_p (def_stmt))
> {
>   name2 = gimple_assign_rhs1 (def_stmt);
> - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))
> + if (CONVERT_EXPR_CODE_P (rhs_code)
>   && INTEGRAL_TYPE_P (TREE_TYPE (name2))
>   && TYPE_UNSIGNED (TREE_TYPE (name2))
>   && prec == TYPE_PRECISION (TREE_TYPE (name2))
> @@ -4767,8 +4772,7 @@ register_edge_assert_for_2 (tree name, e
>  NAME = NAME2 >> CST2.
>
>  Extract CST2 from the right shift.  */
> -  if (is_gimple_assign (def_stmt)
> - && gimple_assign_rhs_code (def_stmt) == RSHIFT_EXPR)
> +  if (rhs_code == RSHIFT_EXPR)
> {
>   name2 = gimple_assign_rhs1 (def_stmt);
>   cst2 = gimple_assign_rhs2 (def_stmt);
> @@ -4840,21 +4844,37 @@ register_edge_assert_for_2 (tree name, e
>/* Add asserts for NAME cmp CST and NAME being defined as
>  NAME = NAME2 & CST2.
>
> -Extract CST2 from the and.  */
> +Extract CST2 from the and.
> +
> +Also handle
> +NAME = (unsigned) NAME2;
> +casts where NAME's type is unsigned and has smaller precision
> +than NAME2's type as if it was NAME = NAME2 & MASK.  */
>names[0] = NULL_TREE;
>names[1] = NULL_TREE;
>cst2 = NULL_TREE;
> -  if (is_gimple_assign (def_stmt)
> - && gimple_assign_rhs_code (def_stmt) == BIT_AND_EXPR)
> +  if (rhs_code == BIT_AND_EXPR
> + || (CONVERT_EXPR_CODE_P (rhs_code)
> + && TREE_CODE (TREE_TYPE (val)) == INTEGER_TYPE
> + && TYPE_UNSIGNED (TREE_TYPE (val))
> + && TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (def_stmt)))
> +> prec
> + && !retval))
> {
>   name2 = gimple_assign_rhs1 (def_stmt);
> - cst2 = gimple_assign_rhs2 (def_stmt);
> + if (rhs_code == BIT_AND_EXPR)
> +   cst2 = gimple_assign_rhs2 (def_stmt);
> + else
> +   {
> + cst2 = TYPE_MAX_VALUE (TREE_TYPE (val));
> + nprec = TYPE_PRECISION (TREE_TYPE (name2));
> +   }
>   if (TREE_CODE (name2) == SSA_NAME
>   && INTEGRAL_TYPE_P (TREE_TYPE (name2))
>   && TREE_CODE (cst2) == INTEGER_CST
>   && !integer_zerop (cst2)
> - && prec <= HOST_BITS_PER_DOUBLE_INT
> - && (prec > 1
> + && nprec <= HOST_BITS_PER_DOUBLE_INT
> + && (nprec > 1
>   || TYPE_UNSIGNED (TREE_TYPE (val
> {
>   gimple def_stmt2 = SSA_NAME_DEF_STMT (name2);
> @@ -4881,12 +4901,12 @@ register_edge_assert_for_2 (tree name, e
>   bool valid_p = false, valn = false, cst2n = false;
>   enum tree_code ccode = comp_code;
>
> - valv = tree_to_double_int (val).zext (prec);
> - cst2v = tree_to_double_int (cst2).zext (prec);
> + valv = tree_to_double_int (val).zext (nprec);
> + cst2v = tree_to_double_int (cst2).zext (nprec);
>   if (!TYPE_UNSIGNED (TREE_TYPE (val)))
> {
> - valn = valv.sext (prec).is_negative ();
> - cst2n = cst2v.sext (prec).is_negative ();
> + valn = valv.sext (nprec).is_negative ();
> + cst2n = cst2v.sext (nprec).is_negative ();
> }
>   /* If CST2 doesn't have most significant bit set,
>  but VAL is negative, we have comparison like
> @@ -4894,7 +4914,7 @@ register_edge_assert_for_2 (tree name, e
>   if (!cst2n && valn)
> ccode = ERROR_MARK;
>   if (cst2n)
> -   sgnbit = do

Re: PATCH trunk: gengtype honoring mark_hook-s inside struct inside union-s

2012-10-04 Thread Basile Starynkevitch
On Thu, Oct 04, 2012 at 07:26:23PM +0200, Richard Guenther wrote:
> On Thu, Oct 4, 2012 at 7:24 PM, Basile Starynkevitch
>  wrote:
> > On Thu, Oct 04, 2012 at 06:51:35PM +0300, Laurynas Biveinis wrote:
> >> > 2012-10-03  Basile Starynkevitch  
> >> >
> >> > * gengtype.c (walk_type): Emit mark_hook when inside a
> >> >   struct of a union member.
> >>
> >> This is OK.
> >
> > thanks, Committed revision 192092 to trunk.
> >
> >
> > I believe this patch should be backported into GCC 4.7 and 4.6
> 
> I see no reason for this unless it is a regression.

If GCC 4.7 will have future micro releases, (like an hypothetical 4.7.3) they 
will have the same bug.
What is the procedure to get this bug fixed in 4.7.3?

(and there are plugins for 4.7 affected by this bug, http://gcc-melt.org/ for 
example)

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: PING^2: [patch] pr/54508: fix incomplete debug information for class

2012-10-04 Thread Cary Coutant
>   /* We also have to mark its parents as used.
> -(But we don't want to mark our parents' kids due to this.)  */
> +(But we don't want to mark our parent's kids due to this,
> +unless it is a class.)  */
>   if (die->die_parent)
> -   prune_unused_types_mark (die->die_parent, 0);
> +   prune_unused_types_mark (die->die_parent,
> +(die->die_parent->die_tag == 
> DW_TAG_class_type ||
> + die->die_parent->die_tag == 
> DW_TAG_structure_type ||
> + die->die_parent->die_tag == 
> DW_TAG_union_type));

I'd suggest replacing these conditions with a call to class_scope_p().
That will also cover DW_TAG_interface_type, which might be irrelevant
for this particular case, but is probably good to cover in the general
case.

-cary


Re: [patch][lra] Improve initial program point density in lra-lives.c

2012-10-04 Thread Vladimir Makarov

On 10/04/2012 12:56 PM, Steven Bosscher wrote:

On Thu, Oct 4, 2012 at 6:12 PM, Vladimir Makarov  wrote:

0.6% sounds really very different from my timings. How much time does
create_start_finish_chains take for you?


0.65% (2.78s).

Actually, I have a profile but I am not sure now that it is for PR54146.
It might be for PR26854.

I'll check it again to be sure.

Not it looks about the same.

Well, that's very strange. Maybe we measure these things differently?
I just hi-hack a timevar, so I measure e.g. the time spent in
create_start_finish_chains like so:

Index: lra-lives.c
===
--- lra-lives.c (revision 192052)
+++ lra-lives.c (working copy)
@@ -770,6 +812,7 @@ create_start_finish_chains (void)
int i, max_regno;
lra_live_range_t r;

+timevar_push (TV_CPROP);
lra_start_point_ranges = XCNEWVEC (lra_live_range_t, lra_live_max_point);
lra_finish_point_ranges = XCNEWVEC (lra_live_range_t, lra_live_max_point);
max_regno = max_reg_num ();
@@ -783,6 +826,7 @@ create_start_finish_chains (void)
   lra_finish_point_ranges[r->finish] = r;
 }
  }
+timevar_pop (TV_CPROP);
  }

  /* Rebuild LRA_START_POINT_RANGES and LRA_FINISH_POINT_RANGES after


so that I get the timings in the -ftime-report like so:

  CPROP   :  43.14 ( 4%) usr
  integrated RA   : 200.81 (17%) usr
  LRA non-specific:  62.18 ( 5%) usr
  LRA virtuals elimination:  61.71 ( 5%) usr
  LRA reload inheritance  :   6.41 ( 1%) usr
  LRA create live ranges  :  139.75 (13%) usr
  LRA hard reg assignment : 130.90 (11%) usr
  LRA coalesce pseudo regs:   2.45 ( 0%) usr
  reload  :   9.09 ( 1%) usr

"Crude, but efficient" (tm) :-)

How do you measure the time spent in that function, and in
remove_some_program_points_and_update_live_ranges?

You use AMD and I use Intel.  So it may be different with cache point of 
view.


Another thing is that I used gprof (-pg was used for bitmap.o lra*.o and 
ira*.o).  Your measurements are more accurate, I think, because it is 
without instrumentation and bitmap.o takes too much time. Bitmap does 
not work well in this case because they are too big and sparse.




Re: Convert more non-GTY htab_t to hash_table.

2012-10-04 Thread Lawrence Crowl
On 10/4/12, Richard Guenther  wrote:
> On Tue, 2 Oct 2012, Lawrence Crowl wrote:
>> On 10/2/12, Richard Guenther  wrote:
>> > On Mon, 1 Oct 2012, Lawrence Crowl wrote:
>> > > Change more non-GTY hash tables to use the new type-safe
>> > > template hash table.  Constify member function parameters that
>> > > can be const.  Correct a couple of expressions in formerly
>> > > uninstantiated templates.
>> > >
>> > > The new code is 0.362% faster in bootstrap, with a 99.5%
>> > > confidence of being faster.
>> > >
>> > > Tested on x86-64.
>> > >
>> > > Okay for trunk?
>> >
>> > You are changing a hashtable used by fold checking, did you test
>> > with fold checking enabled?
>>
>> I didn't know I had to do anything beyond the normal make check.
>> What do I do?
>>
>> > +/* Data structures used to maintain mapping between basic blocks and
>> > +   copies.  */
>> > +static hash_table  bb_original;
>> > +static hash_table  bb_copy;
>> >
>> > note that because hash_table has a constructor we now get global
>> > CTORs for all statics :( (and mx-protected local inits ...)
>>
>> The overhead for the global constructors isn't significant.
>> Only the function-local statics have mx-protection, and that can
>> be eliminated by making them global static.
>>
>> > Can you please try to remove the constructor from hash_table to
>> > avoid this overhead?  (as a followup - that is, don't initialize
>> > htab)
>>
>> The initialization avoids potential errors in calling dispose.
>> I can do it, but I don't think the overhead (after moving the
>> function-local statics to global) will matter, and so I prefer to
>> keep the safety.  So is the move of the statics sufficient or do
>> you still want to remove constructors?
>
> Hm, having them in-scope where they are used is good style.
> Why can't they be statically initialized and put in .data?
> Please make it so - you know C++ enough (ISTR value-initialization
> is default - which means NULL for the pointer?)

Zero initialization is default for static variables, but not for
local or heap variables.  We can live with the uninitialized memory
in some cases, and add another function to explicitly null the
member in the rest of the cases.  I am not convinced that extra
coding is worth the performance difference, particularly as I do
not expect that difference to be measureable.

However we decide here, I think that work should be a separate patch,
as it will certainly touch more files than the current patch.  So,
can we separate the issue?

>
> Richard.
>
>> > The cfg.c, dse.c and hash-table.h parts are ok for trunk, I'll
>> > leave the rest to respective maintainers of the pieces of the
>> > compiler.
>> >
>> > Thanks,
>> > Richard.
>> >
>> >>
>> >> Index: gcc/java/ChangeLog
>> >>
>> >> 2012-10-01  Lawrence Crowl  
>> >>
>> >>   * Make-lang.in (JAVA_OBJS): Add dependence on hash-table.o.
>> >>   (JCFDUMP_OBJS): Add dependence on hash-table.o.
>> >>   (jcf-io.o): Add dependence on hash-table.h.
>> >>   * jcf-io.c (memoized_class_lookups): Change to use type-safe hash
>> >> table.
>> >>
>> >> Index: gcc/c/ChangeLog
>> >>
>> >> 2012-10-01  Lawrence Crowl  
>> >>
>> >>   * Make-lang.in (c-decl.o): Add dependence on hash-table.h.
>> >>   * c-decl.c (detect_field_duplicates_hash): Change to new type-safe
>> >>   hash table.
>> >>
>> >> Index: gcc/objc/ChangeLog
>> >>
>> >> 2012-10-01  Lawrence Crowl  
>> >>
>> >>   * Make-lang.in (OBJC_OBJS): Add dependence on hash-table.o.
>> >>   (objc-act.o): Add dependence on hash-table.h.
>> >>   * objc-act.c (objc_detect_field_duplicates): Change to new type-safe
>> >>   hash table.
>> >>
>> >> Index: gcc/ChangeLog
>> >>
>> >> 2012-10-01  Lawrence Crowl  
>> >>
>> >>   * Makefile.in (fold-const.o): Add depencence on hash-table.h.
>> >>   (dse.o): Likewise.
>> >>   (cfg.o): Likewise.
>> >>   * fold-const.c (fold_checksum_tree): Change to new type-safe hash
>> >> table.
>> >>   * (print_fold_checksum): Likewise.
>> >>   * cfg.c (var bb_original): Likewise.
>> >>   * (var bb_copy): Likewise.
>> >>   * (var loop_copy): Likewise.
>> >>   * hash-table.h (template hash_table): Constify parameters for find...
>> >>   and remove_elt... member functions.
>> >> (hash_table::empty) Correct size expression.
>> >> (hash_table::clear_slot) Correct deleted entry assignment.
>> >>   * dse.c (var rtx_group_table): Change to new type-safe hash table.
>> >>
>> >> Index: gcc/cp/ChangeLog
>> >>
>> >> 2012-10-01  Lawrence Crowl  
>> >>
>> >>   * Make-lang.in (class.o): Add dependence on hash-table.h.
>> >>   (tree.o): Likewise.
>> >>   (semantics.o): Likewise.
>> >>   * class.c (fixed_type_or_null): Change to new type-safe hash table.
>> >>   * tree.c (verify_stmt_tree): Likewise.
>> >>   (verify_stmt_tree_r): Likewise.
>> >>   * semantics.c (struct nrv_data): Likewise.
>> >>
>> >>
>> >> Index: gcc/java/Make-lang.in
>> >> ===
>> >> --- gcc/java/Make-lang.in (revision 191941)
>> >> +++ gcc/java/Ma

Re: patch to fix constant math

2012-10-04 Thread Kenneth Zadeck

On 10/04/2012 12:58 PM, Richard Guenther wrote:

On Thu, Oct 4, 2012 at 3:55 PM, Kenneth Zadeck  wrote:

Let me talk about the mode here first.

What this interface/patch provides is a facility where the constant math
that is done in optimizations is done exactly the way that it would be done
on the target machine.   What we have now is a compiler that only does this
if it convenient to do on the host.   I admit that i care about this more
than others right now, but if intel adds a couple of more instructions to
their vector units, other people will start to really care about this issue.
If you take an OImode value with the current compiler and left shift it by
250 the middle end will say that the result is 0.   This is just wrong!!!

What this means is that the bitsize and precision of the operations need to
be carried along when doing math. when wide-int  checks for overflow on the
multiply or add, it is not checking the if the value overflowed on two HWIs,
it is checking if the add overflowed in the mode of the types that are
represented on the target.   When we do shift, we are not doing a shift
within two HWIs, we are truncating the shift value (if this is appropriate)
according to the bitsize and shifting according the precision.

I think that an argument could be made that storing the mode should be
changed to an explicit precision and bitsize.  (A possible other option
would be to store a tree type, but this would make the usage at the rtl
level very cumbersome since types are rare.) Aside from the work, you would
not get much push back.

But the signess is a different argument.   At the rtl level, the signess is
a matter of context.   (you could argue that this is a mistake and i would
agree, but that is an even bigger change.)   But more to the point, at the
tree level, there are a surprising number of places where the operation
desired does not follow the sign of the types that were used to construct
the constants.   Furthermore, not carrying the sign is more consistent with
the double int code, which as you point out carries nothing.

Well, on RTL the signedness is on the operation (you have sdiv and udiv, etc.).
yes, there is a complete enough set of operations that allow you to 
specify the signess where this matters.



double-int tries to present a sign-less twos-complement entity of size
2 * HOST_BITS_PER_WIDE_INT.  I think that is sensible and for
obvious reasons should not change.  Both tree and RTL rely on this.
What we do not want is that up to TImode you get an internal representation
done one way (twos-complement) and on OImode and larger you
suddenly get subtly different behavior.  That's a recepie for desaster.


This is the main difference between double-int and wide-int.Wide int 
does the math the way the machine does it or the way the front end would 
expect it to be done.There is nothing about the host that is visible 
in the interfaces.


I reiterate, our world is already bigger than 128 bits and the intel 
world is likely to be soon.   Double int is stuck in a 64/128 bit world. 
these patches, which i admit are huge, are a way out of that box.




I'd like to clean up the interface to double-int some more (now with the
nice C++ stuff we have).  double-int should be pure twos-complement,
there should be no operations on double-ints that behave differently
when done signed or unsigned, instead we have signed and unsigned
versions of the operations (similar to how signedness is handled on
the RTL level).  With some trivial C++ fu you could have a
double_sint and double_uint type that would get rid of the bool
sign params we have to some functions (and then you could
write double_sint >> n using operator notation).


The problem is that size does matter.wide int is effectively 
infinite precision twos complement.In practice, we can get by by 
just looking at the bitsize and precision of the types/modes involved 
and this makes the implementation faster than true infinite precision.


I went done the road trying to fix all of the places where the compiler 
either iced or got the wrong answer.   I showed this to Sandiford and he 
talked me out of it.  He was right, it was a rat hole.  It could have 
been a smaller patch but it was there were places where it was clearly 
going to take monumental work just to be able to back out and say that 
you had nothing.The number of places in the compiler where you 
compare against the largest and smallest representation of an integer is 
not small and some of them are buried very deep down chains that were 
not designed to say "i cannot answer that question".


I believe that i have all of the functionality of double int in wide 
int, it is just the calls look different because there are not all of 
the interfaces that take two HWI's.As mentioned before, all of the 
places where the overflow is computed for the purpose of asking if this 
is ok in two hwi's is gone.




I'd like wide-int (whatever it's internal represen

Re: Use conditional casting with symtab_node

2012-10-04 Thread Diego Novillo
On Thu, Oct 4, 2012 at 2:14 PM, Lawrence Crowl  wrote:

> So, Jan Hubicka requested and approved the current spelling.
> What now?

I don't think we should hold this up.  The names Jan requested seem
reasonable enough.  We seem to be running in circles here.


Diego.


Re: PATCH trunk: gengtype honoring mark_hook-s inside struct inside union-s

2012-10-04 Thread Basile Starynkevitch
On Wed, Oct 03, 2012 at 01:02:44PM +0200, Basile Starynkevitch wrote:
> So I applied and I am proposing the following patch to gcc trunk 192031
> (Laurynas, I did take your remarks into account)
> # patch to trunk
> Index: gcc/gengtype.c
> ===
> --- gcc/gengtype.c(revision 192031)
> +++ gcc/gengtype.c(working copy)
> @@ -2810,6 +2810,7 @@ walk_type (type_p t, struct walk_type_data *d)
>   const char *oldval = d->val;
>   const char *oldprevval1 = d->prev_val[1];
>   const char *oldprevval2 = d->prev_val[2];
> + const char *struct_mark_hook = NULL;
>   const int union_p = t->kind == TYPE_UNION;
>   int seen_default_p = 0;
>   options_p o;
> @@ -2833,7 +2834,14 @@ walk_type (type_p t, struct walk_type_data *d)
> if (!desc && strcmp (o->name, "desc") == 0
> && o->kind == OPTION_STRING)
>   desc = o->info.string;
> +   else if (!struct_mark_hook && strcmp (o->name, "mark_hook") == 0
> +&& o->kind == OPTION_STRING)
> + struct_mark_hook = o->info.string;
>  
> + if (struct_mark_hook) 
> + oprintf (d->of, "%*s%s (&%s));\n",
> +  d->indent, "", struct_mark_hook, oldval);

Sorry for the typo, the patch should have only one closing parenthesis.

So I'm applying the following patch to trunk to correct it.

Index: gengtype.c
===
--- gengtype.c  (revision 192094)
+++ gengtype.c  (working copy)
@@ -2839,7 +2839,7 @@
struct_mark_hook = o->info.string;
 
if (struct_mark_hook)
- oprintf (d->of, "%*s%s (&%s));\n",
+ oprintf (d->of, "%*s%s (&%s);\n",
   d->indent, "", struct_mark_hook, oldval);
 
d->prev_val[2] = oldval;


I hope that correcting such a typo falls into the "obvious patch" rule, so I 
dare committing it
right now, reusing the same ChangeLog entry.

 % svn commit gcc/gengtype.c 
Sendinggcc/gengtype.c
Transmitting file data .
Committed revision 192095.

If you feel it is wrong to correct such a typo without asking, I'll revert this 
obvious commit.

Apologies for the typo.

Cheers.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: PING^2: [patch] pr/54508: fix incomplete debug information for class

2012-10-04 Thread Paul_Koning

On Oct 4, 2012, at 1:38 PM, Cary Coutant wrote:

>>  /* We also have to mark its parents as used.
>> -(But we don't want to mark our parents' kids due to this.)  */
>> +(But we don't want to mark our parent's kids due to this,
>> +unless it is a class.)  */
>>  if (die->die_parent)
>> -   prune_unused_types_mark (die->die_parent, 0);
>> +   prune_unused_types_mark (die->die_parent,
>> +(die->die_parent->die_tag == 
>> DW_TAG_class_type ||
>> + die->die_parent->die_tag == 
>> DW_TAG_structure_type ||
>> + die->die_parent->die_tag == 
>> DW_TAG_union_type));
> 
> I'd suggest replacing these conditions with a call to class_scope_p().
> That will also cover DW_TAG_interface_type, which might be irrelevant
> for this particular case, but is probably good to cover in the general
> case.
> 
> -cary

Thanks, that makes it very simple.  Here is the updated patch.

Ok to commit with this change?

paul

ChangeLog:

2012-10-04  Paul Koning  

* dwarf2out.c (prune_unused_types_mark): Mark all of parent's
children if parent is a class.

testsuite/ChangeLog:

2012-10-04  Paul Koning  

* g++.dg/debug/dwarf2/pr54508.C: New.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 192048)
+++ gcc/dwarf2out.c (working copy)
@@ -21035,9 +21035,11 @@
   prune_unused_types_mark_generic_parms_dies (die);
 
   /* We also have to mark its parents as used.
-(But we don't want to mark our parents' kids due to this.)  */
+(But we don't want to mark our parent's kids due to this,
+unless it is a class.)  */
   if (die->die_parent)
-   prune_unused_types_mark (die->die_parent, 0);
+   prune_unused_types_mark (die->die_parent, 
+class_scope_p (die->die_parent));
 
   /* Mark any referenced nodes.  */
   prune_unused_types_walk_attribs (die);
Index: testsuite/g++.dg/debug/dwarf2/pr54508.C
===
--- testsuite/g++.dg/debug/dwarf2/pr54508.C (revision 0)
+++ testsuite/g++.dg/debug/dwarf2/pr54508.C (revision 0)
@@ -0,0 +1,67 @@
+// PR debug/54508
+// { dg-do compile }
+// { dg-options "-g2 -dA" }
+
+// { dg-final { scan-assembler "\"cbase0\"\[ \t\]+\[#;/!|@\]+ 
DW_AT_name\|DW_AT_name: \"cbase\"" } }
+// { dg-final { scan-assembler "\"OPCODE0\"\[ \t\]+\[#;/!|@\]+ 
DW_AT_name\|DW_AT_name: \"OPCODE\"" } }
+// { dg-final { scan-assembler "\"bi0\"\[ \t\]+\[#;/!|@\]+ DW_AT_name" } }
+// { dg-final { scan-assembler "\"si0\"\[ \t\]+\[#;/!|@\]+ DW_AT_name" } }
+// { dg-final { scan-assembler "\"f10\"\[ \t\]+\[#;/!|@\]+ DW_AT_name" } }
+// { dg-final { scan-assembler "\"f20\"\[ \t\]+\[#;/!|@\]+ DW_AT_name" } }
+// { dg-final { scan-assembler-not "\"nc0\"\[ \t\]+\# 
DW_AT_name\|DW_AT_name: \"nc\"" } }
+
+class cbase
+
+{
+public:
+ static int si;
+int bi;
+};
+
+class c : public cbase
+
+{
+public:
+ enum
+ {
+  OPCODE = 251
+ };
+ int i ;
+ static const char *testc (void) { return "foo"; }
+};
+
+struct s
+{
+int f1;
+static const char *tests (void) { return "test"; }
+};
+
+union u
+{
+int f2;
+double d;
+static const char *testu (void) { return "test union"; }
+};
+
+namespace n 
+{
+const char *ntest (void) { return "test n"; }
+
+class nc
+{
+public:
+int i;
+static int sj;
+};
+}
+
+extern void send (int, int, const void *, int);
+
+void test (int src)
+{
+  int cookie = 1;
+  send(src, c::OPCODE, c::testc (), cookie);
+  send(src, c::OPCODE, s::tests (), cookie);
+  send(src, c::OPCODE, u::testu (), cookie);
+  send(src, c::OPCODE, n::ntest (), cookie);
+}


Re: RFC: LRA for x86/x86-64 [0/9]

2012-10-04 Thread Steven Bosscher
On Sat, Sep 29, 2012 at 10:26 PM, Steven Bosscher  wrote:
> To put it in another perspective, here are my timings of trunk vs lra
> (both checkouts done today):
>
> trunk:
>  integrated RA   : 181.68 (24%) usr   1.68 (11%) sys 183.43
> (24%) wall  643564 kB (20%) ggc
>  reload  :  11.00 ( 1%) usr   0.18 ( 1%) sys  11.17 (
> 1%) wall   32394 kB ( 1%) ggc
>  TOTAL : 741.6414.76   756.41
>   3216164 kB
>
> lra branch:
>  integrated RA   : 174.65 (16%) usr   1.33 ( 8%) sys 176.33
> (16%) wall  643560 kB (20%) ggc
>  reload  : 399.69 (36%) usr   2.48 (15%) sys 402.69
> (36%) wall   41852 kB ( 1%) ggc
>  TOTAL :1102.0616.05  1120.83
>   3231738 kB
>
> That's a 49% slowdown. The difference is completely accounted for by
> the timing difference between reload and LRA.

With Vlad's patch to switch off expensive LRA parts for extreme
functions ([lra revision 192093]), the numbers are:

 integrated RA   : 154.27 (17%) usr   1.27 ( 8%) sys 155.64
(17%) wall  131534 kB ( 5%) ggc
 LRA non-specific:  69.67 ( 8%) usr   0.79 ( 5%) sys  70.40 (
8%) wall   18805 kB ( 1%) ggc
 LRA virtuals elimination:  55.53 ( 6%) usr   0.00 ( 0%) sys  55.49 (
6%) wall   20465 kB ( 1%) ggc
 LRA reload inheritance  :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 (
0%) wall  57 kB ( 0%) ggc
 LRA create live ranges  :  80.46 ( 4%) usr   1.05 ( 6%) sys  81.49 (
4%) wall2459 kB ( 0%) ggc
 LRA hard reg assignment :   1.78 ( 0%) usr   0.05 ( 0%) sys   1.85 (
0%) wall   0 kB ( 0%) ggc
 reload  :   6.38 ( 1%) usr   0.13 ( 1%) sys   6.51 (
1%) wall   0 kB ( 0%) ggc
 TOTAL : 917.4216.35   933.78
  2720151 kB

Recalling trunk total time (r191835):

>  TOTAL : 741.6414.76   756.41

the slowdown due to LRA is down from 49% to 23%, with still room for
improvement (even without crippling LRA further). Size with the
expensive LRA parts switched off is still better thank trunk:
$ size slow.o*
   textdata bss dec hex filename
3499938   8 583 3500529  3569f1 slow.o.00_trunk_r191835
3386117   8 583 3386708  33ad54 slow.o.01_lra_r191626
3439755   8 583 3440346  347eda slow.o.02_lra_r192093

The lra-branch outperforms trunk on everything else I've thrown at it,
in terms of compile time and code size at least, and also e.g. on
Fortran polyhedron runtime.

Ciao!
Steven


  1   2   >