Re: [patch] Fix PR tree-optmization/50413

2011-09-24 Thread Richard Guenther
On Mon, Sep 19, 2011 at 1:46 PM, Ira Rosen  wrote:
> Hi,
>
> When we can't vectorize a certain statement in SLP we mark it as not
> vectorizable and continue with the analysis. This is wrong when the
> reason for the failure is that we can't analyze a data-ref, because
> this way we may miss a data dependence. This patch fails SLP if the
> data-refs analysis fails.
>
> Bootstrapped and tested on powerpc64-suse-linux and i486-linux-gnu.
> Committed to trunk.
>
> The same patch bootstrapped and tested on powerpc64-suse-linux for 4.6.
> O.K. for 4.6?

Ok for 4.6 if you resolve the testsuite issue.

Richard.

> Thanks,
> Ira
>
> ChangeLog:
>
>        PR tree-optimization/50413
>        * tree-vect-data-refs.c (vect_analyze_data_refs): Fail to vectorize
>        a basic block if one of its data-refs can't be analyzed.
>
> testsuite/ChangeLog:
>
>        PR tree-optimization/50413
>        * g++.dg/vect/vect.exp: Run slp-pr* tests with
>        -fdump-tree-slp-details.  Run other tests with
>        -fdump-tree-vect-details.
>        * g++.dg/vect/slp-pr50413.cc: New.
>


Re: [PATCH] Distribute inliner's size_time data across entries with similar predicates

2011-09-24 Thread Richard Guenther
On Mon, Sep 19, 2011 at 10:16 PM, Maxim Kuvyrkov  wrote:
> Jan,
>
> The following patch started as a one-liner for ipa-inline-analysis.c: 
> account_size_time() to merge predicates when we are adding data to entry[0] 
> (i.e., when space for 32 size_time entries is exhausted):
>
> @@ -537,6 +592,9 @@ account_size_time (struct inline_summary
>     }
>   else
>     {
> +      e->predicate = or_predicates (summary->conds, &e->predicate, pred);
>       e->size += size;
>       e->time += time;

Shouldn't this be either and_predicates or not accumulating but taking
the minimum (or maximum?)
of the size/time values if using or_predicates?

I wonder why we bother to record so many predicates though.

>       if (e->time > MAX_TIME * INLINE_TIME_SCALE)
>
> The rationale was that since we are accounting size and time under the entry 
> we also need to make entry's predicate a superset of the predicate we want to 
> account the data under.
>
> Then I thought that mushing all predicates into the single predicate of 
> entry[0] will cause it to become true_predicate() very quickly, so I added 
> logic to distribute incoming size_time information across all 32 entries by 
> searching for entries with similar predicates.

That sounds expensive (without looking at the patch).  Shouldn't we
instead look for similar size/time values (maybe sorting the
predicates)?
Thus, when size and time are equal we can simply or the predicates.

Richard.

> OK for trunk assuming testing on x86_64-linux-gnu shows no regressions?
>
> Thank you,
>
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>
>
>


Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-24 Thread Richard Guenther
On Tue, Sep 20, 2011 at 1:13 PM, Tom de Vries  wrote:
> Hi Richard,
>
> I have a patch for PR43814. It introduces an option that assumes that function
> arguments of pointer type are aligned, and uses that information in
> tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.
>
> I tested the patch successfully on-by-default on x86_64 and i686 (both gcc 
> only
> builds).
>
> I also tested the patch on-by-default for ARM (gcc/glibc build). The patch
> generated wrong code for uselocale.c:
> ...
> glibc/locale/locale.h:
> ...
> /* This value can be passed to `uselocale' and may be returned by
>   it. Passing this value to any other function has undefined behavior.  */
> # define LC_GLOBAL_LOCALE       ((__locale_t) -1L)
> ...
> glibc/locale/uselocale.c:
> ...
> locale_t
> __uselocale (locale_t newloc)
> {
>  locale_t oldloc = _NL_CURRENT_LOCALE;
>
>  if (newloc != NULL)
>    {
>      const locale_t locobj
>        = newloc == LC_GLOBAL_LOCALE ? &_nl_global_locale : newloc;
>
> ...
> The assumption that function arguments of pointer type are aligned, allowed 
> the
> test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false.
> But the usage of ((__locale_t) -1L) as function argument in uselocale violates
> that assumption.
>
> Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run without
> regressions for ARM.
>
> Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM,
> discussed here:
> - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html
> - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html
>
> But, since glibc uses this construct currently, the option is off-by-default 
> for
> now.
>
> OK for trunk?

No, I don't like to have an option to control this.  And given the issue
you spotted it doesn't look like the best idea either.  This area in GCC
is simply too fragile right now :/

It would be nice if we could extend IPA-CP to propagate alignment
information though.

And at some point devise a reliable way for frontends to communicate
alignment constraints the middle-end can rely on (well, yes, you could
argue that's what TYPE_ALIGN is about, and I thought that maybe we
can unconditionally rely on TYPE_ALIGN for pointer PARM_DECLs
at least - your example shows we can't).

In the end I'd probably say the patch is ok without the option (thus
turned on by default), but if LC_GLOBAL_LOCALE is part of the
glibc ABI then we clearly can't do this.

Richard.

> Thanks,
> - Tom
>
> 2011-09-20  Tom de Vries 
>
>        PR target/43814
>        * tree-ssa-ccp.c (get_align_value): New function, factored out of
>        get_value_from_alignment.
>        (get_value_from_alignment): Use get_align_value.
>        (get_value_for_expr): Use get_align_value to handle alignment of
>        function argument pointers.
>        * common.opt (faligned-pointer-argument): New option.
>        * doc/invoke.texi (Optimization Options): Add
>        -faligned-pointer-argument.
>        (-faligned-pointer-argument): New item.
>
>        * gcc/testsuite/gcc.dg/pr43814.c: New test.
>        * gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.
>


Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-24 Thread Jakub Jelinek
On Sat, Sep 24, 2011 at 11:31:25AM +0200, Richard Guenther wrote:
> In the end I'd probably say the patch is ok without the option (thus
> turned on by default), but if LC_GLOBAL_LOCALE is part of the
> glibc ABI then we clearly can't do this.

Yes, LC_GLOBAL_LOCALE is part of glibc ABI.  I guess we could only assume
the alignment if the pointer is actually dereferenced on the statement
that checks the ABI or in some stmt that dominates the spot where you want
to check the alignment.  It is IMHO quite common to pass arbitrary values
in pointer types, then cast them back or just compare.

Jakub


Re: misbehaviour with md5_process_bytes and maybe in optimization

2011-09-24 Thread Basile Starynkevitch
On Fri, 23 Sep 2011 12:19:57 -0700
Ian Lance Taylor  wrote:

> I committed this patch to mainline to fix the problem.  Bootstrapped on
> x86_64-unknown-linux-gnu.
> 
> 2011-09-23  Ian Lance Taylor  
> 
>   * md5.c (md5_process_bytes): Correct handling of unaligned
>   buffer.

This is *exactly* the same patch as
Pierre Vittet proposed in 
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00963.html
(but Pierre's patch has not been reviewed).

Perhaps the ChangeLog might also mention Pierre Vittet for that particular 
patch???

Cheers.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***


Re: Handle non-SSA arguments in ipa-inline-analysis better

2011-09-24 Thread Richard Guenther
On Thu, Sep 22, 2011 at 12:51 AM, Jan Hubicka  wrote:
> Hi,
> this patch extends handling of non-SSA arguments to bultin_constant_p and
> execution predicates.
>
> Bootstrapped/regtested x86_64-linux, will commit it shortly.
>
> Honza
>
>        * ipa-inline-analysis.c (set_cond_stmt_execution_predicate): Allow
>        handled components in parameter of builtin_constant_p.
>        (will_be_nonconstant_predicate): Allow loads of non-SSA parameters.
> Index: ipa-inline-analysis.c
> ===
> --- ipa-inline-analysis.c       (revision 179046)
> +++ ipa-inline-analysis.c       (working copy)
> @@ -1202,6 +1202,7 @@ set_cond_stmt_execution_predicate (struc
>   gimple set_stmt;
>   tree op2;
>   tree parm;
> +  tree base;
>
>   last = last_stmt (bb);
>   if (!last
> @@ -1252,7 +1253,8 @@ set_cond_stmt_execution_predicate (struc
>       || gimple_call_num_args (set_stmt) != 1)
>     return;
>   op2 = gimple_call_arg (set_stmt, 0);
> -  parm = unmodified_parm (set_stmt, op2);
> +  base = get_base_address (op2);
> +  parm = unmodified_parm (set_stmt, base ? base : op2);

You don't care for a NULL get_base_address return in the other
place you added it.  I think you should instead bail out on a
NULL return from said.

Note that get_base_address will not strip things down to an
SSA name pointer but will return a MEM[ptr, offset] for that.
But I suppose unmodified_parm / ipa_get_parm_decl_index
already cares for that (dereference of argument)?

>   if (!parm)
>     return;
>   index = ipa_get_param_decl_index (info, parm);
> @@ -1433,6 +1435,7 @@ will_be_nonconstant_predicate (struct ip
>   ssa_op_iter iter;
>   tree use;
>   struct predicate op_non_const;
> +  bool is_load;
>
>   /* What statments might be optimized away
>      when their arguments are constant
> @@ -1443,11 +1446,29 @@ will_be_nonconstant_predicate (struct ip
>       && gimple_code (stmt) != GIMPLE_SWITCH)
>     return p;
>
> -  /* Stores and loads will stay anyway.
> -     TODO: Constant memory accesses could be handled here, too.  */
> -  if (gimple_vuse (stmt))
> +  /* Stores will stay anyway.  */
> +  if (gimple_vdef (stmt))
>     return p;
>
> +  is_load = gimple_vuse (stmt) != NULL;
> +
> +  /* Loads can be optimized when the value is known.  */
> +  if (is_load)
> +    {
> +      tree op = gimple_assign_rhs1 (stmt);
> +      tree base = get_base_address (op);
> +      tree parm;
> +
> +      gcc_assert (gimple_assign_single_p (stmt));
> +      if (!base)
> +       return p;
> +      parm = unmodified_parm (stmt, base);
> +      if (!parm )
> +       return p;
> +      if (ipa_get_param_decl_index (info, parm) < 0)
> +       return p;
> +    }
> +
>   /* See if we understand all operands before we start
>      adding conditionals.  */
>   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> @@ -1466,6 +1487,15 @@ will_be_nonconstant_predicate (struct ip
>       return p;
>     }
>   op_non_const = false_predicate ();
> +  if (is_load)
> +    {
> +      tree parm = unmodified_parm
> +                   (stmt, get_base_address (gimple_assign_rhs1 (stmt)));
> +      p = add_condition (summary,
> +                        ipa_get_param_decl_index (info, parm),
> +                        IS_NOT_CONSTANT, NULL);
> +      op_non_const = or_predicates (summary->conds, &p, &op_non_const);
> +    }
>   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
>     {
>       tree parm = unmodified_parm (stmt, use);
>


Re: Switch function context when doing inline analysis

2011-09-24 Thread Richard Guenther
On Thu, Sep 22, 2011 at 12:52 AM, Jan Hubicka  wrote:
>
> Hi,
> ipa-inline-analysis use is_gimple_min_invariant that in turn require 
> current_function_decl
> to be set to the corresponding function or all addresses of automatic vars 
> are considered
> non-invariant.
>
> Bootstrapped/regtested x86_64-linux, will commit it shortly.

Ick.  Please instead consider adding variants which take a struct
function * to the various
predicates (I guess just decl_address_invariant_p is the real issue).
I'd name them
is_gimple_min_invariant_fn and decl_address_invariant_fn_p.

>        * ipa-inline-analsis.c (compute_inline_parameters): Set
>        cfun and current_function_decl.
>
> Index: ipa-inline-analysis.c
> ===
> --- ipa-inline-analysis.c       (revision 179046)
> +++ ipa-inline-analysis.c       (working copy)
> @@ -1694,6 +1724,7 @@ compute_inline_parameters (struct cgraph
>   HOST_WIDE_INT self_stack_size;
>   struct cgraph_edge *e;
>   struct inline_summary *info;
> +  tree old_decl = current_function_decl;
>
>   gcc_assert (!node->global.inlined_to);
>
> @@ -1718,6 +1749,10 @@ compute_inline_parameters (struct cgraph
>       return;
>     }
>
> +  /* Even is_gimple_min_invariant rely on current_function_decl.  */
> +  current_function_decl = node->decl;
> +  push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> +
>   /* Estimate the stack size for the function if we're optimizing.  */
>   self_stack_size = optimize ? estimated_stack_frame_size (node) : 0;
>   info->estimated_self_stack_size = self_stack_size;
> @@ -1757,6 +1792,8 @@ compute_inline_parameters (struct cgraph
>   info->size = info->self_size;
>   info->stack_frame_offset = 0;
>   info->estimated_stack_size = info->estimated_self_stack_size;
> +  current_function_decl = old_decl;
> +  pop_cfun ();
>  }
>
>
>


Re: [PATCH, testsuite] Add loop unrolling command line options for some test cases

2011-09-24 Thread Richard Guenther
On Thu, Sep 22, 2011 at 6:22 AM, Jiangning Liu  wrote:
> Hi Mike,
>
> OK. I will wait 24 more hours. If no objections by then, I will get it
> checked into trunk.

I don't think you need -funroll-loops though.

> Thanks,
> -Jiangning
>
>> -Original Message-
>> From: Mike Stump [mailto:mikest...@comcast.net]
>> Sent: Thursday, September 22, 2011 3:10 AM
>> To: Jiangning Liu
>> Cc: gcc-patches@gcc.gnu.org; r...@cebitec.uni-bielefeld.de
>> Subject: Re: [PATCH, testsuite] Add loop unrolling command line options
>> for some test cases
>>
>> On Sep 21, 2011, at 1:22 AM, Jiangning Liu wrote:
>> > The fix is to explicitly turn on loop unroll and set max-unroll-times
>> to 8,
>> > which is larger than the unrolling times being detected in the cases.
>>
>> Sounds reasonable to me.  Ok, though, do watch for any comments by
>> people that know more than I.
>
>
>
>
>


Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-24 Thread Richard Guenther
On Sat, Sep 24, 2011 at 11:40 AM, Jakub Jelinek  wrote:
> On Sat, Sep 24, 2011 at 11:31:25AM +0200, Richard Guenther wrote:
>> In the end I'd probably say the patch is ok without the option (thus
>> turned on by default), but if LC_GLOBAL_LOCALE is part of the
>> glibc ABI then we clearly can't do this.
>
> Yes, LC_GLOBAL_LOCALE is part of glibc ABI.  I guess we could only assume
> the alignment if the pointer is actually dereferenced on the statement
> that checks the ABI or in some stmt that dominates the spot where you want
> to check the alignment.  It is IMHO quite common to pass arbitrary values
> in pointer types, then cast them back or just compare.

Yeah (even if technically invoking undefined behavior in C).  Checking if
there is a dereference post-dominating function entry with sth like

  FOR_EACH_IMM_USE_STMT (... ptr ...)
 if (stmt_post_dominates_entry && contains derefrence of ptr)
   alignment = TYPE_ALIGN (...);

and otherwise not assuming anything about parameter alignment might work.
Be careful to check the alignment of the dereference though,

typedef int int_unaligned __attribute__((aligned(1)));
int foo (int *p)
{
  int_unaligned *q = p;
  return *q;
}

will be MEM[p] but with (well, hopefully ;)) TYPE_ALIGN of TREE_TYPE (MEM[p])
being 1.  And yes, you'd have to look into handled-components as well.  I guess
you'll face similar problems as we do with tree-sra.c
tree_non_mode_aligned_mem_p
(you need to assume eventually misaligned accesses the same way expansion
does for the dereference, otherwise you'll run into issues on
strict-align targets).

As that de-refrence thing doesn't really fit the CCP propagation you
won't be able
to handle

int foo (int *p)
{
  int *q = (char *)p + 3;
  return *q;
}

and assume q is aligned (and p is misaligned by 1).

That is, if the definition of a pointer is post-dominated by a derefrence
we could assume proper alignment for that pointer (as opposed to just
special-casing its default definition).  Would be certainly interesting to
see what kind of fallout we would get from that ;)

Richard.

>        Jakub
>


Re: [Patch, testsuite, arm] Skip the arch conflict to enable case neon-thumb2-move pass on more targets.

2011-09-24 Thread Richard Guenther
On Tue, Sep 20, 2011 at 11:35 AM, Terry Guo  wrote:
> Hello,
>
>> >
>> > I suppose you want a torture that excercises different -march/-mtune
>> > combinations then.
>> >
>> > But can't you do the pruning somewhere in an .exp file then instead
>> > of sprinkling it all over the tests itself?
>> >
>>
>> It seems not.  At present the multilib options are all added
>> automatically by the dejagnu infrastructure.
>>
>> R.
>>
>
> Thanks for both of you and I did tried to address Richard Guenther's
> comments but I failed. I couldn't figure out a better way to solve such
> cases for the reason like what Richard Earnshaw says. So I will wait for a
> while. Can I assume no objection or no better proposal as OK?

Look at how gcc-dg.exp implements dg-prune-output and makes use of it.
You should be able to append a global pruning for these issues in
arm.exp.

Richard.


> BR,
> Terry
>
>
>
>


Re: [PATCH] Handle &__restrict parameters in tree-ssa-structalias.c like DECL_BY_REFERENCE parameters

2011-09-24 Thread Richard Guenther
On Fri, Sep 23, 2011 at 7:06 PM, Jakub Jelinek  wrote:
> Hi!
>
> This simple patch improves the f3 function in the testcase below,
> a parameter with TYPE_RESTRICT REFERENCE_TYPE IMHO can be safely treated
> like the DECL_BY_REFERENCE case where the source actually didn't contain
> a reference, but compiler created it anyway.  If source contains &__restrict
> parameter, the parameter again points to a chunk of (for the function
> global) restrict memory that nothing inside of the function should access
> otherwise than through this parameter or pointers derived from it.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2011-09-23  Jakub Jelinek  
>
>        * tree-ssa-structalias.c (intra_create_variable_infos): Treat
>        TYPE_RESTRICT REFERENCE_TYPE parameters like restricted
>        DECL_BY_REFERENCE parameters.
>
>        * g++.dg/tree-ssa/restrict2.C: New test.
>
> --- gcc/tree-ssa-structalias.c.jj       2011-09-15 12:18:37.0 +0200
> +++ gcc/tree-ssa-structalias.c  2011-09-23 15:36:23.0 +0200
> @@ -5589,10 +5589,11 @@ intra_create_variable_infos (void)
>       varinfo_t p;
>
>       /* For restrict qualified pointers to objects passed by
> -         reference build a real representative for the pointed-to object.  */
> -      if (DECL_BY_REFERENCE (t)
> -         && POINTER_TYPE_P (TREE_TYPE (t))
> -         && TYPE_RESTRICT (TREE_TYPE (t)))
> +         reference build a real representative for the pointed-to object.
> +        Treat restrict qualified references the same.  */
> +      if (TYPE_RESTRICT (TREE_TYPE (t))
> +         && ((DECL_BY_REFERENCE (t) && POINTER_TYPE_P (TREE_TYPE (t)))
> +             || TREE_CODE (TREE_TYPE (t)) == REFERENCE_TYPE))
>        {
>          struct constraint_expr lhsc, rhsc;
>          varinfo_t vi;

I don't see why

  f4 (s, s)

would be invalid.  But you would miscompile it.
(I'm not sure that a restrict qualified component is properly defined
by the C standard - we're just making this extension in a very constrained
case to allow Fortran array descriptors to work).

Richard.

> +
> +int
> +f2 (S x)
> +{
> +  x.p[0] = 2;
> +  s.p[0] = 0;
> +// { dg-final { scan-tree-dump-times "return 2" 1 "optimized" } }
> +  return x.p[0];
> +}
> +
> +int
> +f3 (S &__restrict x, S &__restrict y)
> +{
> +  x.p[0] = 3;
> +  y.p[0] = 0;
> +// { dg-final { scan-tree-dump-times "return 3" 1 "optimized" } }
> +  return x.p[0];
> +}
> +
> +int
> +f4 (S &x, S &y)
> +{
> +  x.p[0] = 4;
> +  y.p[0] = 0;
> +// { dg-final { scan-tree-dump-times "return 4" 0 "optimized" } }
> +  return x.p[0];
> +}
> +
> +int
> +f5 (S *__restrict x, S *__restrict y)
> +{
> +  x->p[0] = 5;
> +  y->p[0] = 0;
> +// We might handle this some day
> +// { dg-final { scan-tree-dump-times "return 5" 0 "optimized" } }
> +  return x->p[0];
> +}
> +
> +int
> +f6 (S *x, S *y)
> +{
> +  x->p[0] = 6;
> +  y->p[0] = 0;
> +// { dg-final { scan-tree-dump-times "return 6" 0 "optimized" } }
> +  return x->p[0];
> +}
> +
> +// { dg-final { cleanup-tree-dump "optimized" } }
>
>        Jakub
>


Re: [PATCH] Handle &__restrict parameters in tree-ssa-structalias.c like DECL_BY_REFERENCE parameters

2011-09-24 Thread Jakub Jelinek
On Sat, Sep 24, 2011 at 01:26:36PM +0200, Richard Guenther wrote:
> > +int
> > +f3 (S &__restrict x, S &__restrict y)
> > +{
> > +  x.p[0] = 3;
> > +  y.p[0] = 0;
> > +// { dg-final { scan-tree-dump-times "return 3" 1 "optimized" } }
> > +  return x.p[0];
> > +}
> > +
> > +int
> > +f4 (S &x, S &y)
> > +{
> > +  x.p[0] = 4;
> > +  y.p[0] = 0;
> > +// { dg-final { scan-tree-dump-times "return 4" 0 "optimized" } }
> > +  return x.p[0];
> > +}

> I don't see why
> 
>   f4 (s, s)
> 
> would be invalid.  But you would miscompile it.

f3 (s, s) you mean?  I believe it is invalid.  For f4 it would be valid
and not optimized out.

> (I'm not sure that a restrict qualified component is properly defined
> by the C standard - we're just making this extension in a very constrained
> case to allow Fortran array descriptors to work).

Well, C standard doesn't have references, and C++ doesn't have restrict.
So it is all about extensions.
But what else would be & __restrict for than similar to *__restrict
to say that the pointed (resp. referenced) object must not be accessed
through other means than the reference or references/pointers derived from
it, in the spirit of ISO C99 6.7.3.1.
So, before jumping to __restrict fields, consider
int a[10], b[10];
int *
f8 (S &__restrict x, S &__restrict y)
{
  x.p = a;
  y.p = b;
  return x.p;
}
which we already optimize even before the patch.
It is certainly invalid to call f8 (s, s).

And the restricted fields, it is a straightforward extension to the restrict
definition of ISO C99.  We don't use it just for Fortran descriptors, but
e.g. std::valarray uses __restrict fields too and has that backed up by the
C++ standard requirements.  Two different std::valarray objects will have
different pointers inside of the structure.

My intent currently is to be able to vectorize:
#include 

std::valarray
f9 (std::valarray a, std::valarray b, std::valarray c, int z)
{
  int i;
  for (i = 0; i < z; i++)
{
  a[i] = b[i] + c[i];
  a[i] += b[i] * c[i];
}
  return a;
}

void
f10 (std::valarray &__restrict a, std::valarray &__restrict b, 
std::valarray &__restrict c, int z)
{
  int i;
  for (i = 0; i < z; i++)
{
  a[i] = b[i] + c[i];
  a[i] += b[i] * c[i];
}
}

In f9 we currently handle it differently than in f10, while IMHO it should
be the same thing, a is guaranteed in both cases not to alias b nor c and b
is guaranteed not to alias c, furthermore, a._M_data[0] through a._M_data[z-1]
is guaranteed not to alias b._M_data[0] through b._M_data[z-1] and c._M_data[0]
through c._M_data[z-1] and similarly for b vs. c.  The __restrict on the
_M_data field in std::valarray is a hint that different std::valarray
objects will have different pointers.

In f9 we have:
  size_tD.1850 D.53593;
  intD.9 * restrict D.53592;
  intD.9 & D.53591;
...
  D.53592_7 = MEM[(struct valarrayD.50086 *)aD.50087_6(D) + 8B];
  D.53593_42 = D.53456_5 * 4;
  # PT = nonlocal escaped { D.53660 } (restr)
  D.53591_43 = D.53592_7 + D.53593_42;
...
  *D.53591_43 = D.53462_19;
and while PTA computes the restricted property here, we unfortunately still
don't use it, because D.53591 (which comes from all the inlined wrappers)
isn't TYPE_RESTRICT.  Shouldn't we propagate that property to either
SSA_NAMEs initialized from restricted pointers resp. POINTER_PLUS_EXPRs,
or if it is common to all VAR_DECLs underlying such SSA_NAMEs, to the
VAR_DECLs?

But in f10 we don't get even that far, the a._M_data (which is actually
a->_M_data, since a is a (restricted) reference) load is already itself
not considered as restricted by PTA.

It is nice that we optimize Fortran arrays well, but it would be nice if we
did the same for C++ too.

Jakub


Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)

2011-09-24 Thread Michael Witten
> Re: [google] Linker plugin to do function reordering...

Is there a particularly good reason for why you guys
slip `[google]' into all of your `Subject:' lines?

I was under the impresions that this list is for work
on GCC. Consider putting something germane in the
brackets instead.


Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)

2011-09-24 Thread Diego Novillo

On 11-09-24 09:37 , Michael Witten wrote:

Re: [google] Linker plugin to do function reordering...


Is there a particularly good reason for why you guys
slip `[google]' into all of your `Subject:' lines?


Yes, labels in brackets tend to be markers for branches, version 
numbers, specific modules.  In this case, they're used to indicate 
patches to one of the google branches (http://gcc.gnu.org/svn.html, 
http://gcc.gnu.org/ml/gcc/2011-01/msg00246.html).



Diego.


Re: Force aliases to be output when cgraph decides so

2011-09-24 Thread David Edelsohn
Jan,

This patch causes a bootstrap failure on AIX because some symbols no
longer are exported by libstdc++.  When I remove your patch, bootstrap
proceeds past this failure.

David

exec(): 0509-036 Cannot load program exec(): 0509-036 Cannot load
program /tmp/20110923/./gcc/cc1plus/tmp/20110923/./g
cc/cc1plus because of the following errors:
 because of the following errors:
0509-130 Symbol resolution failed for   0509-130 Symbol
resolution failed for /usr/gnu/lib/libgmpxx.a(libgmpxx
.so.4)/usr/gnu/lib/libgmpxx.a(libgmpxx.so.4) because:
 because:
0509-136   Symbol   0509-136   Symbol
_ZNSt6localeD1Ev_ZNSt6localeD1Ev (number  (number 44) is not exporte
d from
   dependent module ) is not exported from
   dependent module
/tmp/20110923/powerpc-ibm-aix5.3.0.0/libstdc++-v3/src/.libs/libstdc++.a(libstdc++.
so.6)/tmp/20110923/powerpc-ibm-aix5.3.0.0/libstdc++-v3/src/.libs/libstdc++.a(libstdc++.so.6).
.
0509-136   Symbol   0509-136   Symbol
_ZNSt6localeC1ERKS__ZNSt6localeC1ERKS_ (number  (number 66) is not e
xported from
   dependent module ) is not exported from
   dependent module
/tmp/20110923/powerpc-ibm-aix5.3.0.0/libstdc++-v3/src/.libs/libstdc++.a(libstdc++.
so.6)/tmp/20110923/powerpc-ibm-aix5.3.0.0/libstdc++-v3/src/.libs/libstdc++.a(libstdc++.so.6).
.
0509-136   Symbol   0509-136   Symbol
_ZNSt8ios_base4InitD1Ev_ZNSt8ios_base4InitD1Ev (number  (number 1010
) is not exported from
   dependent module ) is not exported from
   dependent module
/tmp/20110923/powerpc-ibm-aix5.3.0.0/libstdc++-v3/src/.libs/libstdc++.a(libstdc++.
so.6)/tmp/20110923/powerpc-ibm-aix5.3.0.0/libstdc++-v3/src/.libs/libstdc++.a(libstdc++.so.6).
.
0509-136   Symbol   0509-136   Symbol
_ZNSt8ios_base4InitC1Ev_ZNSt8ios_base4InitC1Ev (number  (number 
) is not exported from
   dependent module ) is not exported from
   dependent module
/tmp/20110923/powerpc-ibm-aix5.3.0.0/libstdc++-v3/src/.libs/libstdc++.a(libstdc++.
so.6)/tmp/20110923/powerpc-ibm-aix5.3.0.0/libstdc++-v3/src/.libs/libstdc++.a(libstdc++.so.6).


Re: [PATCH PR43513, 1/3] Replace vla with array - Implementation.

2011-09-24 Thread Eric Botcazou
> This is an updated version of the patch. I have 2 new patches and an
> updated testcase which I will sent out individually.
>
> Patch set was bootstrapped and reg-tested on x86_64.
>
> Ok for trunk?
>
> Thanks,
> - Tom
>
> 2011-07-30  Tom de Vries  
>
>   PR middle-end/43513
>   * Makefile.in (tree-ssa-ccp.o): Add $(PARAMS_H) to rule.
>   * tree-ssa-ccp.c (params.h): Include.
>   (fold_builtin_alloca_for_var): New function.
>   (ccp_fold_stmt): Use fold_builtin_alloca_for_var.

We have detected another fallout on some Ada code: the transformation replaces 
a call to __builtin_alloca with &var, i.e. it introduces an aliased variable, 
which invalidates the points-to information of some subsequent call, fooling 
DSE into thinking that it can eliminate a live store.

The brute force approach

Index: tree-ssa-ccp.c
===
--- tree-ssa-ccp.c  (revision 179038)
+++ tree-ssa-ccp.c  (working copy)
@@ -2014,7 +2014,10 @@ do_ssa_ccp (void)
   ccp_initialize ();
   ssa_propagate (ccp_visit_stmt, ccp_visit_phi_node);
   if (ccp_finalize ())
-return (TODO_cleanup_cfg | TODO_update_ssa | TODO_remove_unused_locals);
+return (TODO_cleanup_cfg
+   | TODO_update_ssa
+   | TODO_rebuild_alias
+   | TODO_remove_unused_locals);
   else
 return 0;
 }

works, but we might want to be move clever.  Thoughts?

-- 
Eric Botcazou


[patch] couple of small optimization tweaks

2011-09-24 Thread Eric Botcazou
Hi,

this is a couple of small tweaks to the GIMPLE optimizers aimed at helping 
vectorization in Ada.  More changes will be needed, so no testcases yet.


  1. pass_fold_builtins knows how to delete a call to __builtin_stack_restore 
that is the only real statement in a cleanup, i.e. to turn

:
  .builtin_stack_restore (saved_stack.116_22);
  resx 1

into

:
  GIMPLE_NOP
  resx 1

This is valid when the block has no outgoing edge.  Then ehcleanup can in turn 
delete the cleanup as empty.  The problem is that pass_fold_builtins runs very 
late in the game, so most of the optimizations, e.g. vectorization, are 
constrained by the EH structure, all the more so if -fnon-call-exceptions is 
enabled like in Ada.  This happens as soon as you do dynamic stack allocation, 
because then you have a call to __builtin_stack_restore in the FINALLY part of 
a TRY-FINALLY structure, hence the cleanup.

The first change enhances ehcleanup so as to remove this kind of cleanups on 
its own, without the need to wait for pass_fold_builtins.


  2. The introduction of MEM_REF has disabled vectorization for parameters 
passed by reference in Ada.  This used to work because dr_analyze_innermost 
was always building a pointer via build_fold_addr_expr; now, if the base is a 
MEM_REF, it can just take the first operand, which is a reference and not a 
pointer for a parameter passed by reference in Ada.  And simple_iv, unlike 
other functions in tree-scalar-evolution.c, accepts only pointer types:

  type = TREE_TYPE (op);
  if (TREE_CODE (type) != INTEGER_TYPE
  && TREE_CODE (type) != POINTER_TYPE)
return false;

The second change makes simple_iv use the same test as the other functions in 
tree-scalar-evolution.c.


Bootstrapped/regtested on x86_64-suse-linux, OK for mainline?


2011-09-24  Eric Botcazou  

* tree-eh.c (is_gimple_stack_restore): New predicate.
(cleanup_empty_eh): Allow a call to __builtin_stack_restore if there
is no outgoing edge.

* tree-scalar-evolution.c (simple_iv): Accept all pointer and integral
types.


-- 
Eric Botcazou
Index: tree-eh.c
===
--- tree-eh.c	(revision 179038)
+++ tree-eh.c	(working copy)
@@ -3818,6 +3818,26 @@ infinite_empty_loop_p (edge e_first)
   return inf_loop;
 }
 
+/* Return true if STMT is a call to __builtin_stack_restore.  */
+
+static bool
+is_gimple_stack_restore (gimple stmt)
+{
+  tree callee;
+
+  if (gimple_code (stmt) != GIMPLE_CALL)
+return false;
+
+  callee = gimple_call_fndecl (stmt);
+
+  if (callee
+  && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL
+  && DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE)
+return true;
+
+  return false;
+}
+
 /* Examine the block associated with LP to determine if it's an empty
handler for its EH region.  If so, attempt to redirect EH edges to
an outer region.  Return true the CFG was updated in any way.  This
@@ -3863,8 +3883,15 @@ cleanup_empty_eh (eh_landing_pad lp)
   return cleanup_empty_eh_unsplit (bb, e_out, lp);
 }
 
-  /* The block should consist only of a single RESX statement.  */
+  /* The block should consist only of a single RESX statement, modulo a
+ preceding call to __builtin_stack_restore if there is no outgoing
+ edge, since the call can be eliminated in this case.  */
   resx = gsi_stmt (gsi);
+  if (!e_out && is_gimple_stack_restore (resx))
+{
+  gsi_next (&gsi);
+  resx = gsi_stmt (gsi);
+}
   if (!is_gimple_resx (resx))
 return false;
   gcc_assert (gsi_one_before_end_p (gsi));
Index: tree-scalar-evolution.c
===
--- tree-scalar-evolution.c	(revision 179038)
+++ tree-scalar-evolution.c	(working copy)
@@ -3172,8 +3172,8 @@ simple_iv (struct loop *wrto_loop, struc
   iv->no_overflow = false;
 
   type = TREE_TYPE (op);
-  if (TREE_CODE (type) != INTEGER_TYPE
-  && TREE_CODE (type) != POINTER_TYPE)
+  if (!POINTER_TYPE_P (type)
+  && !INTEGRAL_TYPE_P (type))
 return false;
 
   ev = analyze_scalar_evolution_in_loop (wrto_loop, use_loop, op,


Re: [PATCH] Handle &__restrict parameters in tree-ssa-structalias.c like DECL_BY_REFERENCE parameters

2011-09-24 Thread Jason Merrill

On 09/24/2011 07:26 AM, Richard Guenther wrote:

I don't see why

   f4 (s, s)

would be invalid.  But you would miscompile it.



+int
+f4 (S&x, S&y)
+{
+  x.p[0] = 4;
+  y.p[0] = 0; // { dg-final { scan-tree-dump-times "return 4" 0 "optimized" } }
+  return x.p[0];
+}


It looks to me like the testcase is testing that we *don't* optimize f4, 
which I think is the correct result.



+// We might handle this some day
+// { dg-final { scan-tree-dump-times "return 5" 0 "optimized" } }


But we could optimize f5, so I don't think we want to test for not 
optimizing.  Better would be to test for the optimization, but mark it 
as xfail.


Jason


[v3] libstdc++/50509

2011-09-24 Thread Paolo Carlini

Hi,

committed mainline and 4_6-branch.

Paolo.

//
2011-09-24  John Salmon  

PR libstdc++/50509
* include/bits/random.tcc (seed_seq::generate): Fix computation.
Index: include/bits/random.tcc
===
--- include/bits/random.tcc (revision 179143)
+++ include/bits/random.tcc (working copy)
@@ -2768,7 +2768,7 @@
  _Type __arg = (__begin[__k % __n]
 ^ __begin[(__k + __p) % __n]
 ^ __begin[(__k - 1) % __n]);
- _Type __r1 = __arg ^ (__arg << 27);
+ _Type __r1 = __arg ^ (__arg >> 27);
  __r1 = __detail::__mod<_Type, __detail::_Shift<_Type, 32>::__value,
 1664525u, 0u>(__r1);
  _Type __r2 = __r1;
@@ -2790,7 +2790,7 @@
  _Type __arg = (__begin[__k % __n]
 + __begin[(__k + __p) % __n]
 + __begin[(__k - 1) % __n]);
- _Type __r3 = __arg ^ (__arg << 27);
+ _Type __r3 = __arg ^ (__arg >> 27);
  __r3 = __detail::__mod<_Type, __detail::_Shift<_Type, 32>::__value,
 1566083941u, 0u>(__r3);
  _Type __r4 = __r3 - __k % __n;


Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series

2011-09-24 Thread Gerald Pfeifer
On Thu, 22 Sep 2011, Kirill Yukhin wrote:
> a typo fixed.

Thanks, Kirill.  Note you were attaching the patch as 
Application/OCTET-STREAM which does not generally view nicely for
others; perhaps just include the patch in the body of the mail to
avoid that?

Index: htdocs/gcc-4.7/changes.html
===
+ Implementation and automatic generation of 
__builtin_clz* 
+   using lzcnt instruction is available via 
-mlzcnt.

...using the ... instruction   (Add "the").

+ A new -mfsgsbase option is available to enable GCC
+ to use new segment register read/write instructions through dedicated 
built-ins.

Perhaps say "command-line option" instead of just option?  Though we
don't have that in the earlier cases either.

And "available that makes GCC I'm a bit use"?

How does this happen via built-ins?  From a user perspective, isn't this
just emitting the respective assember instructions?  If so, perhaps just
say "that makes GCC generate"

+ Support for new Intel rdrnd instruction is available via 
-mrdrnd.

"the new Intel"

Fine from my perspective with these changes, though please give an
x86 maintainer time to chime in, too, before you commit.

Gerald


Re: misbehaviour with md5_process_bytes and maybe in optimization

2011-09-24 Thread Ian Lance Taylor
Basile Starynkevitch  writes:

> On Fri, 23 Sep 2011 12:19:57 -0700
> Ian Lance Taylor  wrote:
>
>> I committed this patch to mainline to fix the problem.  Bootstrapped on
>> x86_64-unknown-linux-gnu.
>> 
>> 2011-09-23  Ian Lance Taylor  
>> 
>>  * md5.c (md5_process_bytes): Correct handling of unaligned
>>  buffer.
>
> This is *exactly* the same patch as
> Pierre Vittet proposed in 
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00963.html
> (but Pierre's patch has not been reviewed).
>
> Perhaps the ChangeLog might also mention Pierre Vittet for that particular 
> patch???

Sorry, I didn't see his patch.

Sure, go ahead and change libiberty/ChangeLog.

Note that as I explained in my message we should not remove the
_STRING_ARCH_unaligned test.

Ian


Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)

2011-09-24 Thread Michael Witten
On Sat, 24 Sep 2011 10:00:37 -0400, Diego Novillo wrote:

> On 11-09-24 09:37 , Michael Witten wrote:
>>> Re: [google] Linker plugin to do function reordering...
>>
>> Is there a particularly good reason for why you guys
>> slip `[google]' into all of your `Subject:' lines?
>
> Yes, labels in brackets tend to be markers for branches, version 
> numbers, specific modules.  In this case, they're used to indicate 
> patches to one of the google branches (http://gcc.gnu.org/svn.html, 
> http://gcc.gnu.org/ml/gcc/2011-01/msg00246.html).

>From that email:

> google/integration
>A branch following trunk that contains some minimal patches that
> are likely not useful anywhere outside of Google's build environment.
> These are typically configuration patches.

Why is gnu.gcc.org hosting work that is specific to some company's
build system?

> google/main
>A branch of google/integration that contains Google local patches
> that we are looking to contribute to trunk.  Some of these patches are
> either in the process of being reviewed, or have not yet been
> proposed.  The intent of this branch is to serve as a staging platform
> to allow collaboration with external developers.  Patches in this
> branch are only expected to remain here until they are reviewed and
> accepted in trunk.

Why is it necessary to announce a patch [series] for this branch when it
is intended that such a patch [series] make it to the trunk? Shouldn't an
employee of your company submit a `trunk'-worthy patch [series] for review
as would anyone else?

Isn't having one branch named `google' (or `google/maint') too ridiculously
generic to be of any use whatsoever? Wouldn't it make far more sense to have
a topic branch if deemed necessary (for, say, a large patch series)?

Why is gnu.gcc.org hosting such a pointless branch? Is it just that the
technical inadequacies of SVN made it easier for your multi-billion-dollar
company to host its essentially private work in GNU's repository?

Furthermore, looking at the `Subject' header of this email:

  Subject: Re: [google] Linker plugin to do function reordering using
callgraph edge profiles (issue5124041)

I wonder what `issue5124041' means. Is that a reference that only has
meaning for employees of your company?

Sincerely,
Michael Witten


Re: [PATCH] non-GNU C++ compilers

2011-09-24 Thread Marc Glisse

On Sat, 17 Sep 2011, Joseph S. Myers wrote:


These are OK (with ChangeLog entries properly omitting the "include/",
since they go in include/ChangeLog) in the absence of libiberty maintainer
objections within 72 hours.


Thanks. Is someone willing to commit them now they have been accepted? I 
am attaching them as a single patch and copying the changelog entries here 
for convenience (I wrote the date of Monday because it looks like a day 
where someone might have time to commit...).


include/ChangeLog:

2011-09-26  Ulrich Drepper  

* obstack.h [!GNUC] (obstack_free): Avoid cast to int.

2011-09-26  Marc Glisse  

* ansidecl.h (ENUM_BITFIELD): Always use enum in C++

--
Marc GlisseIndex: include/ansidecl.h
===
--- include/ansidecl.h  (revision 179146)
+++ include/ansidecl.h  (working copy)
@@ -416,10 +416,12 @@
 #define EXPORTED_CONST const
 #endif
 
-/* Be conservative and only use enum bitfields with GCC.
+/* Be conservative and only use enum bitfields with C++ or GCC.
FIXME: provide a complete autoconf test for buggy enum bitfields.  */
 
-#if (GCC_VERSION > 2000)
+#ifdef __cplusplus
+#define ENUM_BITFIELD(TYPE) enum TYPE
+#elif (GCC_VERSION > 2000)
 #define ENUM_BITFIELD(TYPE) __extension__ enum TYPE
 #else
 #define ENUM_BITFIELD(TYPE) unsigned int
Index: include/obstack.h
===
--- include/obstack.h   (revision 179146)
+++ include/obstack.h   (working copy)
@@ -532,9 +532,9 @@
 # define obstack_free(h,obj)   \
 ( (h)->temp = (char *) (obj) - (char *) (h)->chunk,\
   (((h)->temp > 0 && (h)->temp < (h)->chunk_limit - (char *) (h)->chunk)\
-   ? (int) ((h)->next_free = (h)->object_base  \
-   = (h)->temp + (char *) (h)->chunk)  \
-   : (((obstack_free) ((h), (h)->temp + (char *) (h)->chunk), 0), 0)))
+   ? (((h)->next_free = (h)->object_base   \
+   = (h)->temp + (char *) (h)->chunk), 0)  \
+   : ((obstack_free) ((h), (h)->temp + (char *) (h)->chunk), 0)))
 
 #endif /* not __GNUC__ or not __STDC__ */
 


Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-24 Thread Hans-Peter Nilsson
On Sat, 24 Sep 2011, David Miller wrote:
> Hans, here is what I'm playing with right now against current
> trunk.

A spot-check review:

> I looked at the use cases for making use of the scale factor in the
> VIS %gsr register and it's used similar to how rounding modes are
> modified in the FPU control register.

It's more of a parameter actually, GSR.scale_factor is the
bit-shift count for the pack insns and GSR.alignaddr_offset the
byte-shift in the aligndata insns.

> You have a function, or family of functions, that want to operate with
> a certain scale factor.  And at the top level the first thing you do
> is set the %gsr as you want it to be set.

Certainly an improvement, but...

> So I've added a GSR register to the sparc backend and then added
> __vis_write_gsr() and __vis_read_gsr() functions to facilitate the
> use cases I've seen.

I'd prefer it as a parameter to the builtins (expanding to two
insns, letting gcc get rid of the redundant ones; let the
initialization value be 0).  I understand you're trying to keep
some kind of compatibility there, but additional builtins would
do the trick and fit nicely: the new builtins expanding to a set
of GSR (GSR field) followed by the "old" insn but fixed as in
this patch.  Besides, the functions that use GSR still can't be
const in this patch.  I guess they never can, when you think of
it, setting and/or using a register that can affect/be affected
something elsewhere, when that something is known to gcc.  Oh well.

Another aspect would be to model the different GSR fields as
different registers; they're used completely differently and
just happen to be set with the same insn.  That might help gcc
getting rid of redundant settings.

> This allowed me to describe to the compiler exactly what the alignaddr
> instructions do, and thus the unspecs for them are now gone.
>
> The pack and faligndata intrinsics still need to be unspec,

FWIW not "need"; IIUC at least faligndata *can* be a vec_select
of a vec_concat of the two vectors, but in practice I don't
think gcc can make use of it yet and all ports use unspec...

While on faligndata, see vec_realign_load_ (sadly
undocumented at present); it'll enable the autovectorizer to...
autovectorize some more.  (Right, I'm working on [yet] another
SIMD back-end, implemented as MIPS COP2 insns.)

> and thus I
> merely added GSR uses to those patterns which is enough to let the
> compiler get the dataflow right.

How about putting it inside the unspec vector?  Those "use"
thingies always gives me the creeps; outside of an insn (no, not
here) they're sometimes lost and at least disconnected to the
insn.  I think practically there's no difference here.

> This all seems sufficient for what things like Sun's medialib and your
> RAPP project want to do.
>
> I'll look into your other suggestion in PR48974, namely making use of
> fone VIS instructions.

One more: please consider adding a
 if (TARGET_VIS) builtin_define ("__VIS__=something") so I as a
user theoretically wouldn't *have* to autoconfiscate for the
changes. ;)

> +  def_builtin_const ("__builtin_vis_fpack16", CODE_FOR_fpack16_vis,
> +  v4qi_ftype_v4hi);
> +  def_builtin_const ("__builtin_vis_fpack32", CODE_FOR_fpack32_vis,
> +  v8qi_ftype_v2si_v8qi);
> +  def_builtin_const ("__builtin_vis_fpackfix", CODE_FOR_fpackfix_vis,
> +  v2hi_ftype_v2si);
>def_builtin_const ("__builtin_vis_fexpand", CODE_FOR_fexpand_vis,
>v4hi_ftype_v4qi);

No, they (and aligndata) can't be const as long as they're
affected by something other than their parameters (GSR); pure
yes but not const.  See extend.texi.


> +  def_builtin_const ("__builtin_vis_alignaddr", CODE_FOR_alignaddrdi_vis,
> +  ptr_ftype_ptr_di);
> +  def_builtin_const ("__builtin_vis_alignaddrl", 
> CODE_FOR_alignaddrldi_vis,
> +  ptr_ftype_ptr_di);

Can't be neither pure nor const; affects something global (GSR).

BTW, vector header files are overrated, at least when there's no
compiler compatibility expected.  They can even be in the way:
there's an ARM NEON PR being stalled because of concern that the
header could be used with another gcc version.  Bah. ...ok I see
visintrin.h is already in.  Never mind then. :)

brgds, H-P


Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)

2011-09-24 Thread Diego Novillo
I think you may be trolling, but I'll give you the benefit of the
doubt since you seem to be lacking some background.

On Sat, Sep 24, 2011 at 15:19, Michael Witten  wrote:

> Why is gnu.gcc.org hosting work that is specific to some company's
> build system?

We've long allowed different companies hold branches on gcc.gnu.org.
>From one of the links I posted in my previous response:
http://gcc.gnu.org/svn.html#distrobranches

> Why is it necessary to announce a patch [series] for this branch when it
> is intended that such a patch [series] make it to the trunk? Shouldn't an
> employee of your company submit a `trunk'-worthy patch [series] for review
> as would anyone else?

Yes, and you will see several patches from google.com addresses that
are not labeled [google].  Those are meant for trunk or devel
branches.  It is true that if a patch is meant for trunk, it should
not have a branch tag.  I expect slipups like that to occur from time
to time.  Thanks for pointing it out.

> I wonder what `issue5124041' means. Is that a reference that only has
> meaning for employees of your company?

No.  This is Rietveld, an open source code review system.  I suggested
using it for code reviews a while ago and contributed a script to
facilitate using it with GCC.  See http://gcc.gnu.org/wiki/rietveld


Diego.


Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-24 Thread David Miller
From: Hans-Peter Nilsson 
Date: Sat, 24 Sep 2011 17:15:06 -0400 (EDT)

> It's more of a parameter actually, GSR.scale_factor is the
> bit-shift count for the pack insns and GSR.alignaddr_offset the
> byte-shift in the aligndata insns.

I realize this.

> I'd prefer it as a parameter to the builtins (expanding to two
> insns, letting gcc get rid of the redundant ones; let the
> initialization value be 0).  I understand you're trying to keep
> some kind of compatibility there, but additional builtins would
> do the trick and fit nicely: the new builtins expanding to a set
> of GSR (GSR field) followed by the "old" insn but fixed as in
> this patch.  Besides, the functions that use GSR still can't be
> const in this patch.  I guess they never can, when you think of
> it, setting and/or using a register that can affect/be affected
> something elsewhere, when that something is known to gcc.  Oh well.

I read this idea in your PR before I did this work and I disagree that
this is a better approach, because then I have to assume that you care
about all the other bits in the %gsr register.

So on the first set I'd have to read it, mask it out, then set the
scale bits.  A needless waste of 20 to 30 cycles on UltraSPARC-III.

If you just call "__vis_write_gsr()" at the beginning of your kernel,
you can tell the compiler that you just want to set the scaling bits
and you don't care about the others at all.

> Another aspect would be to model the different GSR fields as
> different registers; they're used completely differently and
> just happen to be set with the same insn.  That might help gcc
> getting rid of redundant settings.

Again, this doesn't allow the user to say "don't care" about the other
fields like a plain "__vis_write_gsr(2<<3)" call does.

You know what fields actually matter for your code.

> FWIW not "need"; IIUC at least faligndata *can* be a vec_select
> of a vec_concat of the two vectors, but in practice I don't
> think gcc can make use of it yet and all ports use unspec...
> 
> While on faligndata, see vec_realign_load_ (sadly
> undocumented at present); it'll enable the autovectorizer to...
> autovectorize some more.  (Right, I'm working on [yet] another
> SIMD back-end, implemented as MIPS COP2 insns.)

Thanks for these suggestions.

> How about putting it inside the unspec vector?  Those "use"
> thingies always gives me the creeps; outside of an insn (no, not
> here) they're sometimes lost and at least disconnected to the
> insn.  I think practically there's no difference here.

The canonical thing to do is to put them outside of the unspec
so that is what I have done.

> One more: please consider adding a
>  if (TARGET_VIS) builtin_define ("__VIS__=something") so I as a
> user theoretically wouldn't *have* to autoconfiscate for the
> changes. ;)

This is on my todo list as well, I'll try to emit some CPP define
compatible with what Sun uses.  But, thanks for reminding me.

>> +  def_builtin_const ("__builtin_vis_fpack16", CODE_FOR_fpack16_vis,
>> + v4qi_ftype_v4hi);
>> +  def_builtin_const ("__builtin_vis_fpack32", CODE_FOR_fpack32_vis,
>> + v8qi_ftype_v2si_v8qi);
>> +  def_builtin_const ("__builtin_vis_fpackfix", CODE_FOR_fpackfix_vis,
>> + v2hi_ftype_v2si);
>>def_builtin_const ("__builtin_vis_fexpand", CODE_FOR_fexpand_vis,
>>   v4hi_ftype_v4qi);
> 
> No, they (and aligndata) can't be const as long as they're
> affected by something other than their parameters (GSR); pure
> yes but not const.  See extend.texi.

Good catch, I was thinking purely on the RTL level where we do show
the compiler all of the "inputs" but at the tree level this is not
visible.

I'll fix that up for the next revision.

>> +  def_builtin_const ("__builtin_vis_alignaddr", 
>> CODE_FOR_alignaddrdi_vis,
>> + ptr_ftype_ptr_di);
>> +  def_builtin_const ("__builtin_vis_alignaddrl", 
>> CODE_FOR_alignaddrldi_vis,
>> + ptr_ftype_ptr_di);
> 
> Can't be neither pure nor const; affects something global (GSR).

Gotcha.

I'd like to revisit this at some point in the future though, maybe we
can legitimately at least mark these things pure.


Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)

2011-09-24 Thread Mike Stump
On Sep 24, 2011, at 12:19 PM, Michael Witten wrote:
> Why is gnu.gcc.org hosting work that is specific to some company's
> build system?

This list isn't for this topic.  If you want, please, really, go play in 
gnu.misc.discuss.  This list is for technical patches and the technical review 
of such.  If your email isn't of that nature, then it is off-topic for this 
list.  Thanks.

> Why is gnu.gcc.org hosting such a pointless branch? Is it just that the
> technical inadequacies of SVN made it easier for your multi-billion-dollar
> company to host its essentially private work in GNU's repository?

This isn't GNU's repository, this is GCC's repository.


[v3] libstdc++/50510

2011-09-24 Thread Paolo Carlini

Hi,

committed to mainline and 4_6-branch.

Paolo.

///
2011-09-24  John Salmon  

PR libstdc++/50510
* include/bits/random.tcc (seed_seq::generate): Fix computation.
Index: include/bits/random.tcc
===
--- include/bits/random.tcc (revision 179144)
+++ include/bits/random.tcc (working copy)
@@ -2796,8 +2796,8 @@
  _Type __r4 = __r3 - __k % __n;
  __r4 = __detail::__mod<_Type,
   __detail::_Shift<_Type, 32>::__value>(__r4);
- __begin[(__k + __p) % __n] ^= __r4;
- __begin[(__k + __q) % __n] ^= __r3;
+ __begin[(__k + __p) % __n] ^= __r3;
+ __begin[(__k + __q) % __n] ^= __r4;
  __begin[__k % __n] = __r4;
}
 }


Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-24 Thread Hans-Peter Nilsson
On Sat, 24 Sep 2011, David Miller wrote:
> From: Hans-Peter Nilsson 
> Date: Sat, 24 Sep 2011 17:15:06 -0400 (EDT)
> > I'd prefer it as a parameter to the builtins (expanding to two
> > insns, letting gcc get rid of the redundant ones; let the
> > initialization value be 0).  I understand you're trying to keep
> > some kind of compatibility there, but additional builtins would
> > do the trick and fit nicely: the new builtins expanding to a set
> > of GSR (GSR field) followed by the "old" insn but fixed as in
> > this patch.  Besides, the functions that use GSR still can't be
> > const in this patch.  I guess they never can, when you think of
> > it, setting and/or using a register that can affect/be affected
> > something elsewhere, when that something is known to gcc.  Oh well.
>
> I read this idea in your PR before I did this work and I disagree that
> this is a better approach, because then I have to assume that you care
> about all the other bits in the %gsr register.

I don't understand what you mean here.  Maybe it doesn't
matter...  My suggestions come from observing what gcc did to
the "faked gsr modelling" I had to use with the current releases
(what moving and eliminating redundant variable settings used in
asms that it did; turned out acceptable FWIW, no redundant
reads), which would map directly to my suggestion.  But I guess
you have a point in that your setting-gsr-then-using-builtins
maps better to the machine insns.

BTW, don't forget to clobber GSR at call insns!

> So on the first set I'd have to read it, mask it out, then set the
> scale bits.  A needless waste of 20 to 30 cycles on UltraSPARC-III.

No, it doesn't have to be read.  If the fields have (useful)
implicit initial values (like scale=7 and align=4) at the
beginning of any function, you wouldn't have to read and mask,
just set.  (Caveat: the port has to have a way to emit a
gsr-setting even if the supposed-initial-values are specified -
like another register or variable, or the initial-value
machinery as I suggested.)

> If you just call "__vis_write_gsr()" at the beginning of your kernel,
> you can tell the compiler that you just want to set the scaling bits
> and you don't care about the others at all.

Don't care how?  They're certainly set by both __vis_write_gsr()
and alignaddr and used by faligndata.  I guess my confusion is
that I don't see what aspect is "don't care" here that'd be
"care" with my suggestion.

> > Another aspect would be to model the different GSR fields as
> > different registers; they're used completely differently and
> > just happen to be set with the same insn.  That might help gcc
> > getting rid of redundant settings.
>
> Again, this doesn't allow the user to say "don't care" about the other
> fields like a plain "__vis_write_gsr(2<<3)" call does.

But that'd set GSR.alignaddr_offset to 0 rather than "don't
care".

> You know what fields actually matter for your code.

A good reason to model them as different registers!

Still, this is a good start and much more workable (and
schedulable) than what's already there, thank you for that.
It doesn't add hurdles for a revisit, if the mechanism is found
unusable or the generated code pessimal!

brgds, H-P


Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-24 Thread David Miller
From: Hans-Peter Nilsson 
Date: Sat, 24 Sep 2011 18:37:33 -0400 (EDT)

> BTW, don't forget to clobber GSR at call insns!

This I explicitly want to avoid and is an explicit design decision.

Like I said the model is like setting the floating point rounding mode
for a family of functions.

You set the floating point rounding mode at the top level, run your
kernel and all the helper functions in that mode.

The %gsr scaling factor is to be used similarly.

You have to control all the functions that get called once you set the
%gsr before a calculation, and they either have to explicitly save and
restore the %gsr around changes to %gsr, or have been designed to use
the %gsr setting made by the callee.

The last thing I want to do is have to teach reload how to handle this
thing, it simply makes no sense to put that much engineering into it
if it is for zero or very little gain.

And it would explicitly prevent the kind of model I see as the most
reasonable for using this register, in that if we clobber it during
a call there is no way for the user to say not to save and restore
%gsr over a call.

>> So on the first set I'd have to read it, mask it out, then set the
>> scale bits.  A needless waste of 20 to 30 cycles on UltraSPARC-III.
> 
> No, it doesn't have to be read.  If the fields have (useful)
> implicit initial values (like scale=7 and align=4) at the
> beginning of any function, you wouldn't have to read and mask,
> just set.

You can't just set.  What about the VIS-2.0 byte-mask at the top
32-bits of the register, are you just going to clobber that when you
change the scale factor?

If we support treating the different fields as different registers we
have to preserve the setting of the other fields of %gsr when we
change one of them.  There are 5 fields currently defined:

1) align address <2:0>
2) scale factor <7:3>
3) interval rounding mode (VIS 2.0) <26:25>
4) interval mode enable <27>
5) Byte mask (VIS 2.0) <63:32>

And also this idea of using get_hard_reg_initial_val() to "optimize"
this kind of usage especially forces us to clobber the %gsr over
function calls which, as stated, I want to avoid if at all possible.

>> Again, this doesn't allow the user to say "don't care" about the other
>> fields like a plain "__vis_write_gsr(2<<3)" call does.
> 
> But that'd set GSR.alignaddr_offset to 0 rather than "don't
> care".

Zero is equivalent to "don't care" in this situation if either
1) you aren't doing any falignaddr operations or 2) you are
then going to subsequently do an "alignaddr" to set that field
up.

Look at the medialib code, that's basically the usage model there
and I think it's quite reasonable.

> It doesn't add hurdles for a revisit, if the mechanism is found
> unusable or the generated code pessimal!

Absolutely, thanks for your review.


Re: [PATCH PR43513, 1/3] Replace vla with array - Implementation.

2011-09-24 Thread Tom de Vries
On 09/24/2011 05:29 PM, Eric Botcazou wrote:
>> This is an updated version of the patch. I have 2 new patches and an
>> updated testcase which I will sent out individually.
>>
>> Patch set was bootstrapped and reg-tested on x86_64.
>>
>> Ok for trunk?
>>
>> Thanks,
>> - Tom
>>
>> 2011-07-30  Tom de Vries  
>>
>>  PR middle-end/43513
>>  * Makefile.in (tree-ssa-ccp.o): Add $(PARAMS_H) to rule.
>>  * tree-ssa-ccp.c (params.h): Include.
>>  (fold_builtin_alloca_for_var): New function.
>>  (ccp_fold_stmt): Use fold_builtin_alloca_for_var.
> 
> We have detected another fallout on some Ada code: the transformation 
> replaces 
> a call to __builtin_alloca with &var, i.e. it introduces an aliased variable, 
> which invalidates the points-to information of some subsequent call, fooling 
> DSE into thinking that it can eliminate a live store.
> 
> The brute force approach
> 
> Index: tree-ssa-ccp.c
> ===
> --- tree-ssa-ccp.c  (revision 179038)
> +++ tree-ssa-ccp.c  (working copy)
> @@ -2014,7 +2014,10 @@ do_ssa_ccp (void)
>ccp_initialize ();
>ssa_propagate (ccp_visit_stmt, ccp_visit_phi_node);
>if (ccp_finalize ())
> -return (TODO_cleanup_cfg | TODO_update_ssa | TODO_remove_unused_locals);
> +return (TODO_cleanup_cfg
> +   | TODO_update_ssa
> +   | TODO_rebuild_alias
> +   | TODO_remove_unused_locals);
>else
>  return 0;
>  }
> 
> works, but we might want to be move clever.  Thoughts?
> 

How about attached (untested) patch implementing a conservative, but
runtime-efficient approach?

Thanks,
- Tom
Index: gcc/tree-ssa-ccp.c
===
--- gcc/tree-ssa-ccp.c (revision 179043)
+++ gcc/tree-ssa-ccp.c (working copy)
@@ -1729,6 +1729,7 @@ fold_builtin_alloca_for_var (gimple stmt
   array_type = build_array_type_nelts (elem_type, n_elem);
   var = create_tmp_var (array_type, NULL);
   DECL_ALIGN (var) = align;
+  pt_solution_add_var (&get_ptr_info (lhs)->pt, var);
 
   /* Fold alloca to the address of the array.  */
   return fold_convert (TREE_TYPE (lhs), build_fold_addr_expr (var));
Index: gcc/tree-ssa-alias.h
===
--- gcc/tree-ssa-alias.h (revision 179043)
+++ gcc/tree-ssa-alias.h (working copy)
@@ -125,6 +125,7 @@ extern void dump_alias_stats (FILE *);
 
 /* In tree-ssa-structalias.c  */
 extern unsigned int compute_may_aliases (void);
+extern void pt_solution_add_var (struct pt_solution *, tree);
 extern bool pt_solution_empty_p (struct pt_solution *);
 extern bool pt_solution_includes_global (struct pt_solution *);
 extern bool pt_solution_includes (struct pt_solution *, const_tree);
Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c (revision 179043)
+++ gcc/tree-ssa-structalias.c (working copy)
@@ -5952,6 +5952,14 @@ pt_solution_ior_into (struct pt_solution
   bitmap_ior_into (dest->vars, src->vars);
 }
 
+void
+pt_solution_add_var (struct pt_solution *dest, tree var)
+{
+  struct pt_solution var_pt;
+  pt_solution_set_var (&var_pt, var);
+  pt_solution_ior_into (dest, &var_pt);
+}
+
 /* Return true if the points-to solution *PT is empty.  */
 
 bool


Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-24 Thread Hans-Peter Nilsson
On Sat, 24 Sep 2011, David Miller wrote:

> From: Hans-Peter Nilsson 
> Date: Sat, 24 Sep 2011 18:37:33 -0400 (EDT)
>
> > BTW, don't forget to clobber GSR at call insns!
>
> This I explicitly want to avoid and is an explicit design decision.

Aha, now I get it; that's certainly key.  Thanks for taking the time.

Yes, it's certainly more flexible to have the user set GSR than
allowing gcc to clobber it when seeing VIS intrinsics, at the
minor usability cost of the user having to keep track of GSR
separately to when used in the individual intrinsics.

> Like I said the model is like setting the floating point rounding mode
> for a family of functions.

Aha 2: I didn't interpret what you wrote as referring to the
model; I thought you meant the actual function (one of the
usages of the fpack insns being "fixed math").  Sure.

> Zero is equivalent to "don't care" in this situation if either
> 1) you aren't doing any falignaddr operations or 2) you are

(JFTR, "faligndata")

> then going to subsequently do an "alignaddr" to set that field
> up.

brgds, H-P
PS. gcc-4.7/changes.html?


Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-24 Thread David Miller
From: Hans-Peter Nilsson 
Date: Sat, 24 Sep 2011 19:32:55 -0400 (EDT)

> PS. gcc-4.7/changes.html?

Also on my TODO list, and Eric made some noise about documenting these
improvements as well, thanks for the reminder.

I'll post and commit the current version of my %gsr changes after my
bootstrap/testsuite run finishes.


C++ PATCH to implement C++11 non-static data member initializers

2011-09-24 Thread Jason Merrill
This patch implements C++11 non-static data member initializers (NSDMI), 
as proposed in 
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2756.htm and 
specified by the C++11 standard.


For ease of reading, the changes are broken into four patches:

1) Implementation of non-static data member initializers with immediate 
parsing.  This patch was somewhat smaller than it might have been 
because there was still some code left from when G++ experimented with 
non-static data member initializers back in the early '90s, though I 
needed to fix a typo at one point.


2) Implementation of deferred parsing of NSDMI.  As with member function 
bodies and default arguments, names used in an NSDMI can be declared 
later in the class body, so we need to wait and parse them at the end of 
the class.


3) Implementation of deferred instantiation of NSDMI.  This is not 
specified by the standard, but seems natural given the other 
similarities between NSDMI and default arguments.


4) Implementation of core issue 1351 for NSDMI: discussion of this issue 
at the Bloomington meeting led to general agreement that an NSDMI that 
can throw should cause an implicitly-declared default constructor to be 
declared noexcept(false).  This patch is currently broken because it 
conflicts with deferred parsing (#2 above); we can't tell whether the 
NSDMI throws until we parse it, but we can't parse it until we've 
completed the class, and declaring the default constructor is part of 
completing the class.  So I've marked the test as XFAIL pending 
committee direction on this point.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 3f2eec8c217136c06a1c24af4ded96a6cde5a4df
Author: Jason Merrill 
Date:   Fri Sep 16 11:11:02 2011 -0400

	Implement C++11 non-static data member initializers.
	* cp-tree.h (enum cpp_warn_str): Add CPP0X_NSDMI.
	* error.c (maybe_warn_cpp0x): Handle it.
	* call.c (convert_like_real) [ck_user]: Don't complain about
	using an explicit constructor for direct-initialization.
	* class.c (check_field_decl): Fix ancient typo.
	(check_field_decls): NSDMIs make the default ctor non-trivial.
	* decl.c (cp_finish_decl): Record NSDMI.
	(grokdeclarator): Allow NSDMI.
	* decl2.c (grokfield): Allow NSDMI.  Correct LOOKUP flags.
	* init.c (perform_member_init): Use NSDMI.
	* method.c (walk_field_subobs): Check for NSDMI.
	* parser.c (cp_parser_member_declaration): Parse { } init.
	* semantics.c (register_constexpr_fundef): Don't talk about
	a return statement in a constexpr constructor.
	(cxx_eval_call_expression): Check DECL_INITIAL instead of
	DECL_SAVED_TREE.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 8c99f7a..6a7dfd3 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -5648,6 +5648,9 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum,
 	/* When converting from an init list we consider explicit
 	   constructors, but actually trying to call one is an error.  */
 	if (DECL_NONCONVERTING_P (convfn) && DECL_CONSTRUCTOR_P (convfn)
+	/* Unless this is for direct-list-initialization.  */
+	&& !(BRACE_ENCLOSED_INITIALIZER_P (expr)
+		 && CONSTRUCTOR_IS_DIRECT_INIT (expr))
 	/* Unless we're calling it for value-initialization from an
 	   empty list, since that is handled separately in 8.5.4.  */
 	&& cand->num_convs > 0)
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index acfe3f2..a7d8218 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -2958,7 +2958,7 @@ check_field_decl (tree field,
 {
   /* `build_class_init_list' does not recognize
 	 non-FIELD_DECLs.  */
-  if (TREE_CODE (t) == UNION_TYPE && any_default_members != 0)
+  if (TREE_CODE (t) == UNION_TYPE && *any_default_members != 0)
 	error ("multiple fields in union %qT initialized", t);
   *any_default_members = 1;
 }
@@ -3256,6 +3256,14 @@ check_field_decls (tree t, tree *access_decls,
 		 "  but does not override %", t);
 }
 
+  /* Non-static data member initializers make the default constructor
+ non-trivial.  */
+  if (any_default_members)
+{
+  TYPE_NEEDS_CONSTRUCTING (t) = true;
+  TYPE_HAS_COMPLEX_DFLT (t) = true;
+}
+
   /* If any of the fields couldn't be packed, unset TYPE_PACKED.  */
   if (cant_pack)
 TYPE_PACKED (t) = 0;
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index f2c9211..2f93bba 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -394,7 +394,9 @@ typedef enum cpp0x_warn_str
   /* inline namespaces */
   CPP0X_INLINE_NAMESPACES,
   /* override controls, override/final */
-  CPP0X_OVERRIDE_CONTROLS
+  CPP0X_OVERRIDE_CONTROLS,
+  /* non-static data member initializers */
+  CPP0X_NSDMI
 } cpp0x_warn_str;
   
 /* The various kinds of operation used by composite_pointer_type. */
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 495d8a0..661cc5e 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6075,6 +6075,10 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_

[PATCH] Teach sparc backend about %gsr register and add intrinsics to access it.

2011-09-24 Thread David Miller

As discussed over the past few days.  Committed to trunk.

Hans, thanks again for all of your feedback.  I'll also work
on adding more VIS test cases.

gcc/

* config/sparc/sparc.h (FIRST_PSEUDO_REGISTER): Bump to 103.
(SPARC_GSR_REG): Define.
(FIXED_REGISTERS): Mark GSR as fixed.
(CALL_USED_REGISTERS): Mark GSR as call used.
(HARD_REGNO_NREGS): GSR is always 1 register.
(REG_CLASS_CONTENTS): Add GSR to ALL_REGS.
(REG_ALLOC_ORDER, REG_LEAF_ALLOC_ORDER): Add GSR to the end.
(REGISTER_NAMES): Add "%gsr".
* config/sparc/sparc.md (UNSPEC_ALIGNADDR, UNSPEC_ALIGNADDRL):
Delete.
(UNSPEC_WRGSR): New unspec.
(GSR_REG): New constant.
(type): Add new insn type 'gsr'.
(fpack16_vis, fpackfix_vis, fpack32_vis,
faligndata_vis)): Add use of GSR_REG.
(wrgsr_vis, *wrgsr_sp64, wrgsr_v8plus, rdgsr_vis, *rdgsr_sp64,
rdgsr_v8plus): New expanders and insns.
(alignaddr_vis, alignaddrl_vis): Reimplement
using patterns which show that this is a plus in addition to a
modification of GSR_REG, instead of an unspec.
* config/sparc/ultra1_2.md: Handle 'gsr'.
* config/sparc/ultra3.md: Likewise.
* config/sparc/niagara.md: Likewise.
* config/sparc/niagara2.md: Likewise.
* config/sparc/sparc.c (leaf_reg_remap, sparc_leaf_regs): Fill out
end of table.
(sparc_option_override): Make -mvis imply -mv8plus.
(hard_32bit_mode_classes, hard_64bit_mode_classes): Add entries
for %gsr.
(sparc_vis_init_builtins): Build __builtin_vis_write_gsr and
__builtin_vis_read_gsr.
(sparc_expand_buildin): Handle builtins that take one argument and
return void.
(sparc_fold_builtin): Never fold writes to %gsr.
* config/sparc/visintrin.h (__vis_write_gsr, __vis_read_gsr): New.
* doc/extend.texi: Document new VIS intrinsics.
---
 gcc/ChangeLog|   39 +
 gcc/config/sparc/niagara.md  |2 +-
 gcc/config/sparc/niagara2.md |4 +-
 gcc/config/sparc/sparc.c |   73 +---
 gcc/config/sparc/sparc.h |   26 +
 gcc/config/sparc/sparc.md|  129 --
 gcc/config/sparc/ultra1_2.md |2 +-
 gcc/config/sparc/ultra3.md   |2 +-
 gcc/config/sparc/visintrin.h |   14 +
 gcc/doc/extend.texi  |3 +
 10 files changed, 240 insertions(+), 54 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1ca1113..8e86131 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,42 @@
+2011-09-24  David S. Miller  
+
+   * config/sparc/sparc.h (FIRST_PSEUDO_REGISTER): Bump to 103.
+   (SPARC_GSR_REG): Define.
+   (FIXED_REGISTERS): Mark GSR as fixed.
+   (CALL_USED_REGISTERS): Mark GSR as call used.
+   (HARD_REGNO_NREGS): GSR is always 1 register.
+   (REG_CLASS_CONTENTS): Add GSR to ALL_REGS.
+   (REG_ALLOC_ORDER, REG_LEAF_ALLOC_ORDER): Add GSR to the end.
+   (REGISTER_NAMES): Add "%gsr".
+   * config/sparc/sparc.md (UNSPEC_ALIGNADDR, UNSPEC_ALIGNADDRL):
+   Delete.
+   (UNSPEC_WRGSR): New unspec.
+   (GSR_REG): New constant.
+   (type): Add new insn type 'gsr'.
+   (fpack16_vis, fpackfix_vis, fpack32_vis,
+   faligndata_vis)): Add use of GSR_REG.
+   (wrgsr_vis, *wrgsr_sp64, wrgsr_v8plus, rdgsr_vis, *rdgsr_sp64,
+   rdgsr_v8plus): New expanders and insns.
+   (alignaddr_vis, alignaddrl_vis): Reimplement
+   using patterns which show that this is a plus in addition to a
+   modification of GSR_REG, instead of an unspec.
+   * config/sparc/ultra1_2.md: Handle 'gsr'.
+   * config/sparc/ultra3.md: Likewise.
+   * config/sparc/niagara.md: Likewise.
+   * config/sparc/niagara2.md: Likewise.
+   * config/sparc/sparc.c (leaf_reg_remap, sparc_leaf_regs): Fill out
+   end of table.
+   (sparc_option_override): Make -mvis imply -mv8plus.
+   (hard_32bit_mode_classes, hard_64bit_mode_classes): Add entries
+   for %gsr.
+   (sparc_vis_init_builtins): Build __builtin_vis_write_gsr and
+   __builtin_vis_read_gsr.
+   (sparc_expand_buildin): Handle builtins that take one argument and
+   return void.
+   (sparc_fold_builtin): Never fold writes to %gsr.
+   * config/sparc/visintrin.h (__vis_write_gsr, __vis_read_gsr): New.
+   * doc/extend.texi: Document new VIS intrinsics.
+
 2011-09-22  Maxim Kuvyrkov  
 
* ipa-prop.c (ipa_print_node_jump_functions): Fix typos.
diff --git a/gcc/config/sparc/niagara.md b/gcc/config/sparc/niagara.md
index a618b19..3e5a3e5 100644
--- a/gcc/config/sparc/niagara.md
+++ b/gcc/config/sparc/niagara.md
@@ -114,5 +114,5 @@
  */
 (define_insn_reservation "niag_vis" 8
   (and (eq_attr "cpu" "niagara")
-(eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdist,edge"))
+(eq_attr "type" "fga,fgm_pack,fgm_mul,fgm_cmp,fgm_pdis

Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-24 Thread David Miller
From: David Miller 
Date: Sat, 24 Sep 2011 20:05:19 -0400 (EDT)

> From: Hans-Peter Nilsson 
> Date: Sat, 24 Sep 2011 19:32:55 -0400 (EDT)
> 
>> PS. gcc-4.7/changes.html?
> 
> Also on my TODO list, and Eric made some noise about documenting these
> improvements as well, thanks for the reminder.

I just commited an update to the wwwdocs.


Re: [PATCH] Add VIS intrinsics header for sparc.

2011-09-24 Thread David Miller
From: David Miller 
Date: Sat, 24 Sep 2011 02:08:32 -0400 (EDT)

> I'll look into your other suggestion in PR48974, namely making use of
> fone VIS instructions.

Hans, just FYI, here is a patch I am regression testing which
implements this.

diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index cca34ed..317602c 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -18,7 +18,7 @@
 ;; .
 
 ;;; Unused letters:
-;;;ABCD   P Z
+;;;AB   
 ;;;ajklq  tuvwxyz
 
 
@@ -52,6 +52,10 @@
  (and (match_code "const_double")
   (match_test "const_zero_operand (op, mode)")))
 
+(define_constraint "C"
+ "The floating-point all-ones constant"
+ (and (match_code "const_double")
+  (match_test "const_all_ones_operand (op, mode)")))
 
 ;; Integer constant constraints
 
@@ -95,6 +99,10 @@
  (and (match_code "const_int")
   (match_test "ival == 4096")))
 
+(define_constraint "P"
+ "The integer constant -1"
+ (and (match_code "const_int")
+  (match_test "ival == -1")))
 
 ;; Extra constraints
 ;; Our memory extra constraints have to emulate the behavior of 'm' and 'o',
@@ -146,3 +154,8 @@
  "The vector zero constant"
  (and (match_code "const_vector")
   (match_test "const_zero_operand (op, mode)")))
+
+(define_constraint "Z"
+ "The vector all ones constant"
+ (and (match_code "const_vector")
+  (match_test "const_all_ones_operand (op, mode)")))
diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md
index 4af960a..21399b5 100644
--- a/gcc/config/sparc/predicates.md
+++ b/gcc/config/sparc/predicates.md
@@ -29,6 +29,35 @@
   (and (match_code "const_int,const_double,const_vector")
(match_test "op == CONST1_RTX (mode)")))
 
+;; Return true if the integer representation of OP is
+;; all-ones.
+(define_predicate "const_all_ones_operand"
+  (match_code "const_int,const_double,const_vector")
+{
+  if (GET_CODE (op) == CONST_INT && INTVAL (op) == -1)
+return true;
+#if HOST_BITS_PER_WIDE_INT == 32
+  if (GET_CODE (op) == CONST_DOUBLE
+  && GET_MODE (op) == VOIDmode
+  && CONST_DOUBLE_HIGH (op) == ~(HOST_WIDE_INT)0
+  && CONST_DOUBLE_LOW (op) == ~(HOST_WIDE_INT)0)
+return true;
+#endif
+  if (GET_CODE (op) == CONST_VECTOR)
+{
+  int i, num_elem = CONST_VECTOR_NUNITS (op);
+
+  for (i = 0; i < num_elem; i++)
+{
+  rtx n = CONST_VECTOR_ELT (op, i);
+  if (! const_all_ones_operand (n, mode))
+return false;
+}
+  return true;
+}
+  return false;
+})
+
 ;; Return true if OP is the integer constant 4096.
 (define_predicate "const_4096_operand"
   (and (match_code "const_int")
@@ -211,6 +240,12 @@
   (ior (match_operand 0 "register_operand")
(match_operand 0 "const_zero_operand")))
 
+;; Return true if OP is either the zero constant, the all-ones
+;; constant, or a register.
+(define_predicate "register_or_zero_or_all_ones_operand"
+  (ior (match_operand 0 "register_or_zero_operand")
+   (match_operand 0 "const_all_ones_operand")))
+
 ;; Return true if OP is a register operand in a floating point register.
 (define_predicate "fp_register_operand"
   (match_operand 0 "register_operand")
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index d648e87..3446379 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -1170,9 +1170,11 @@ sparc_expand_move (enum machine_mode mode, rtx *operands)
   if (operands [1] == const0_rtx)
operands[1] = CONST0_RTX (mode);
 
-  /* We can clear FP registers if TARGET_VIS, and always other regs.  */
+  /* We can clear or set to all-ones FP registers if TARGET_VIS, and
+always other regs.  */
   if ((TARGET_VIS || REGNO (operands[0]) < SPARC_FIRST_FP_REG)
- && const_zero_operand (operands[1], mode))
+ && (const_zero_operand (operands[1], mode)
+ || const_all_ones_operand (operands[1], mode)))
return false;
 
   if (REGNO (operands[0]) < SPARC_FIRST_FP_REG
@@ -3096,19 +3098,21 @@ sparc_legitimate_constant_p (enum machine_mode mode, 
rtx x)
 return true;
 
   /* Floating point constants are generally not ok.
-The only exception is 0.0 in VIS.  */
+The only exception is 0.0 and all-ones in VIS.  */
   if (TARGET_VIS
  && SCALAR_FLOAT_MODE_P (mode)
- && const_zero_operand (x, mode))
+ && (const_zero_operand (x, mode)
+ || const_all_ones_operand (x, mode)))
return true;
 
   return false;
 
 case CONST_VECTOR:
   /* Vector constants are generally not ok.
-The only exception is 0 in VIS.  */
+The only exception is 0 or -1 in VIS.  */
   if (TARGET_VIS
- && const_zero_operand (x, mode))
+ && (const_zero_operand (x, mode)
+ || const_all_ones_operand (x, mode)))
return true;
 
   retu