Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-19 Thread Uros Bizjak
On Tue, Jun 19, 2012 at 8:38 AM, Uros Bizjak  wrote:
> On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson  wrote:
>> On 2012-06-18 13:19, Uros Bizjak wrote:
>>>        /* ??? The builtin doesn't understand that the PCMPESTRI read from
>>>        memory need not be aligned.  */
>>> -      __asm ("%vpcmpestri $0, (%1), %2"
>>> -          : "=c"(index) : "r"(s), "x"(search), "a"(4), "d"(16));
>>> +      sv = __builtin_ia32_loaddqu ((const char *) s);
>>> +      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
>>> +
>>
>>
>> Surely the comment can be removed too then?
>
> I'm not sure there. The builtin, as defined, expects V16QI operand
> with xm constraint. Using:
>
> int test (const char *s1)
> {
>  const v16qi *p = (const v16qi *)(unsigned long) s1;
>  return __builtin_ia32_pcmpistri128 (*p, ...);
> }
>
> will generate movdqa before pcmpistri.

Pedantic correction: __builtin_ia32_pcmpistri128 (v16qi_arg, *p, N);

movdqa in front of this builtin will be generated with -O0.

Uros.


Re: [v3] PR 53270 fix hppa-linux bootstrap regression

2012-06-19 Thread Jonathan Wakely
On 14 June 2012 23:23, Jonathan Wakely wrote:
>
> For 4.6.4 and 4.7.2 I plan to make a less intrusive change, #undef'ing
> the __GTHREAD_MUTEX_INIT, _GTHREAD_RECURSIVE_MUTEX_INIT and
> __GTHREAD_COND_INIT macros on hppa-linux in C++11 mode, so that the
> init functions are used instead.  This fixes the bootstrap regression
> on hppa-linux without affecting other targets.

Here's the simpler patch I'm committing to the 4.7 and 4.6 branches.

PR libstdc++/53270
* config/os/gnu-linux/os_defines.h: Disable static initializer macros
for gthreads types in C++11 mode.

Tested hppa-linux.
commit 82976f5a0e4a69d247bded9d8bae99a633360f20
Author: Jonathan Wakely 
Date:   Tue Jun 19 01:07:54 2012 +0100

PR libstdc++/53270
* config/os/gnu-linux/os_defines.h: Disable static initializer macros
for gthreads types in C++11 mode.

diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h 
b/libstdc++-v3/config/os/gnu-linux/os_defines.h
index c4aa305..f41160f 100644
--- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
+++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
@@ -46,4 +46,10 @@
 # undef _GLIBCXX_HAVE_GETS
 #endif
 
+#if defined(__hppa__) && defined(__GXX_EXPERIMENTAL_CXX0X__)
+# define _GTHREAD_USE_MUTEX_INIT_FUNC
+# define _GTHREAD_USE_RECURSIVE_MUTEX_INIT_FUNC
+# define _GTHREAD_USE_COND_INIT_FUNC
+#endif
+
 #endif


Re: [Patch] Adjustments for Windows x64 SEH

2012-06-19 Thread Tristan Gingold

On Jun 18, 2012, at 4:28 PM, Kai Tietz wrote:

> Hello Tristan,
> 
> patch works for me, too. Just one nit about the patch.
> 
> 2012/6/18 Tristan Gingold :
>> @@ -8558,6 +8558,11 @@ ix86_frame_pointer_required (void)
>>   if (TARGET_32BIT_MS_ABI && cfun->calls_setjmp)
>> return true;
>> 
>> +  /* Win64 SEH, very large frames need a frame-pointer as maximum stack
>> + allocation is 4GB (add a safety guard for saved registers).  */
>> +  if (TARGET_64BIT_MS_ABI && get_frame_size () + 4096 > SEH_MAX_FRAME_SIZE)
>> +return true;
> Where does this magic 4096 comes from?  Is it intended to be the
> page-size, or is it meant to be the maximum stack-frame consumed by
> prologue?

It is an upper bound for the maximum stack-frame consumed by prologue.

>  I would suggest to use here instead:
> +  if (TARGET_64BIT_MS_ABI && get_frame_size () > (SEH_MAX_FRAME_SIZE - 4096))
> +return true;
> 
> Additional a testcase for big-stackframe would be interesting.  You
> won't need to make here a execution test, a assembler-scan would be
> enough.

I think that a simple build test should make it.

Thanks,
Tristan.



Re: RFA: Fix PR53688

2012-06-19 Thread Richard Guenther
On Mon, Jun 18, 2012 at 4:59 PM, Michael Matz  wrote:
> Hi,
>
> now that we regard MEM_EXPR as a conservative approximation for MEM_SIZE
> (and MEM_OFFSET) we must ensure that this is really the case.  It isn't
> currently for the string expanders, as they use the MEM_REF (whose address
> was taken) directly as the one to use for MEM_EXPR on the MEM rtx.  That's
> wrong, on gimple side we take the address only and hence its size is
> arbitrary.
>
> So, we have to build a memref always and rewrite its type to one
> representing the real size.  Note that TYPE_MAX_VALUE may be NULL, so we
> don't need to check for 'len' being null or not.
>
> This fixes the C testcase (don't know about fma 3d), and is in
> regstrapping on x86_64-linux.  Okay if that passes?

Ok.

Note that as a followup you should be able to remove the whole

  /* Allow the string and memory builtins to overflow from one
 field into another, see http://gcc.gnu.org/PR23561.
 Thus avoid COMPONENT_REFs in MEM_EXPR unless we know the whole
 memory accessed by the string or memory builtin will fit
 within the field.  */
  if (MEM_EXPR (mem) && TREE_CODE (MEM_EXPR (mem)) == COMPONENT_REF)
{

block.  Also practically (as we are expanding from GIMPLE now), off
should always be zero and TREE_CODE (exp) should never be
POINTER_PLUS_EXPR, nor should there be wrapping conversions.
The 'off' case can also be dealt with by using the offset operand of
the MEM_REF we build.

Finally MEM_EXPR itself has invalid type-based aliasing properties
(it has so even before your patch), of course that doesn't really matter,
as below we do set_mem_alias_set (mem, 0).  Still with MEM_REF
you should be able to do

Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 188733)
+++ gcc/builtins.c  (working copy)
@@ -1250,132 +1250,27 @@ expand_builtin_prefetch (tree exp)
 static rtx
 get_memory_rtx (tree exp, tree len)
 {
-  tree orig_exp = exp;
   rtx addr, mem;
-  HOST_WIDE_INT off;

-  /* When EXP is not resolved SAVE_EXPR, MEM_ATTRS can be still derived
- from its expression, for expr->a.b only .a.b is recorded.  */
-  if (TREE_CODE (exp) == SAVE_EXPR && !SAVE_EXPR_RESOLVED_P (exp))
-exp = TREE_OPERAND (exp, 0);
-
-  addr = expand_expr (orig_exp, NULL_RTX, ptr_mode, EXPAND_NORMAL);
+  addr = expand_expr (exp, NULL_RTX, ptr_mode, EXPAND_NORMAL);
   mem = gen_rtx_MEM (BLKmode, memory_address (BLKmode, addr));

-  /* Get an expression we can use to find the attributes to assign to MEM.
- If it is an ADDR_EXPR, use the operand.  Otherwise, dereference it if
- we can.  First remove any nops.  */
-  while (CONVERT_EXPR_P (exp)
-&& POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (exp, 0
-exp = TREE_OPERAND (exp, 0);
-
-  off = 0;
-  if (TREE_CODE (exp) == POINTER_PLUS_EXPR
-  && TREE_CODE (TREE_OPERAND (exp, 0)) == ADDR_EXPR
-  && host_integerp (TREE_OPERAND (exp, 1), 0)
-  && (off = tree_low_cst (TREE_OPERAND (exp, 1), 0)) > 0)
-exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
-  else if (TREE_CODE (exp) == ADDR_EXPR)
-exp = TREE_OPERAND (exp, 0);
-  else if (POINTER_TYPE_P (TREE_TYPE (exp)))
-exp = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (exp)), exp);
-  else
-exp = NULL;
+  /* Build a memory reference suitable for MEM_EXPR for use by the
+ alias oracle.  Make sure to give that memory reference a proper
+ access size as well as alias-set zero.  */
+  exp = fold_build2 (MEM_REF,
+build_array_type (char_type_node,
+  build_range_type (sizetype,
+size_one_node, len)),
+exp, build_int_cst (ptr_type_node, 0));

   /* Honor attributes derived from exp, except for the alias set
  (as builtin stringops may alias with anything) and the size
  (as stringops may access multiple array elements).  */
-  if (exp)
-{
-  set_mem_attributes (mem, exp, 0);
-
-  if (off)
-   mem = adjust_automodify_address_nv (mem, BLKmode, NULL, off);
-
-  /* Allow the string and memory builtins to overflow from one
-field into another, see http://gcc.gnu.org/PR23561.
-Thus avoid COMPONENT_REFs in MEM_EXPR unless we know the whole
-memory accessed by the string or memory builtin will fit
-within the field.  */
-  if (MEM_EXPR (mem) && TREE_CODE (MEM_EXPR (mem)) == COMPONENT_REF)
-   {
- tree mem_expr = MEM_EXPR (mem);
- HOST_WIDE_INT offset = -1, length = -1;
- tree inner = exp;
-
- while (TREE_CODE (inner) == ARRAY_REF
-|| CONVERT_EXPR_P (inner)
-|| TREE_CODE (inner) == VIEW_CONVERT_EXPR
-|| TREE_CODE (inner) == SAVE_EXPR)
-   inner = TREE_OPERAND (inner, 0);
-
- gcc_assert (TREE_CODE (inner) == COMPONENT_REF);
-
- if (MEM_OFFSET_K

Re: [patch] Fix PR48109 using artificial top-level asm statements (darwin/objc)

2012-06-19 Thread Richard Guenther
On Mon, Jun 18, 2012 at 7:51 PM, Steven Bosscher  wrote:
> Hello,
>
> This patch started as an attempt to remove #include "output.h" from
> objc/: Instead of writing references directly to asm_out_file, the
> references are output as top-level asm statements. It's a bit of a
> hack, but it works and it's a "better hack" than writing to
> asm_out_file from a front end, and it also happens to fix PR
> objc/48109 to make ObjC on darwin/-m32 LTO-compatible.
>
> Bootstrapped&tested on darwin by Iain, and bootstrapped&tested by me
> on x86_64-unknown-linux-gnu.
> OK for trunk?

Ok for the general idea and implementation, I'd still ask for a darwin
maintainer
ack though.

Thanks,
Richard.

> Ciao!
> Steven


Re: [4.6][ARM] Backport "MCR Not available in Thumb1"

2012-06-19 Thread Richard Earnshaw
On 19/06/12 04:03, Joey Ye wrote:
> Backporting trunk r179979
> 
> OK for 4.6?
> 
> Backported from mainline
> 2011-10-14  David Alan Gilbert  
> 
> PR target/48126
> * config/arm/arm.c (arm_output_sync_loop): Move label before
> barrier.
> 
> Index: gcc/config/arm/arm.h
> ===
> --- gcc/config/arm/arm.h  (revision 188331)
> +++ gcc/config/arm/arm.h  (working copy)
> @@ -294,7 +294,8 @@
>  #define TARGET_HAVE_DMB  (arm_arch7)
>  
>  /* Nonzero if this chip implements a memory barrier via CP15.  */
> -#define TARGET_HAVE_DMB_MCR  (arm_arch6k && ! TARGET_HAVE_DMB)
> +#define TARGET_HAVE_DMB_MCR  (arm_arch6 && ! TARGET_HAVE_DMB \
> +  && ! TARGET_THUMB1)
>  
>  /* Nonzero if this chip implements a memory barrier instruction.  */
>  #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)
> 
> 

Not ok (yet), the ChangeLog entry doesn't match the patch.

R.



Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant

2012-06-19 Thread Richard Guenther
On Mon, Jun 18, 2012 at 9:51 PM, Jiří Hruška  wrote:
> Hi all,
>
> I have tracked down a bug which results in invalid code being
> generated for indexed TARGET_MEM_REF expressions during dominator
> optimization.
>
> The conditions are: accessing objects adjacent in memory in a loop (in
> order to generate the TARGET_MEM_REF gimple) and optimizing this tree
> item during dom optimization (to trigger folding). There might be
> another set of conditions which get to the same state through a
> different
>
> The problem is that get_ref_base_and_extent() for TARGET_MEM_REF with
> variable index sets `maxsize' to -1 to signal that "via index or
> index2, the whole object can be reached" and returns. But before that,
> if the target object is a declaration with known size and `maxsize' is
> -1, it is updated, which can be taken by the caller (if `maxsize'
> equals to basic `size') as possibility to fold the expression into a
> constant.
>
> Assuming I understood the code and comments right, the solution is
> then to really take a quick exit in the abovementioned "indexed" case
> instead of just breaking the loop and letting the rest of function
> change the `maxsize' parameter.
>
> A quick search did not reveal any existing ticket for this problem.
> The bug was originally found in GCC 4.6.1 while compiling x86 code
> under MinGW, which is what the attached simplified testcase is based
> upon (compilation with -O1 is OK, anything higher fails).
> GCC 3.4.6 seems unaffected.
> Also the relevant code parts seem unchanged in current trunk.
> Patched build of 4.7.1 survived bootstrap on x86_64-rhel fine.
>
> The attached patch and all changes provided therein are released to
> public domain and can be freely used or modified by anyone.
>
> (This is my first time dealing with GCC bowels, please excuse my
> superficial understanding of everything I have written above.)

The issue is that your testcase is invalid.

__attribute__((section(".rodata$int0"))) const int fooS = 0;
__attribute__((section(".rodata$int1"))) const int foo1 = 1;
__attribute__((section(".rodata$int2"))) const int foo2 = 2;
__attribute__((section(".rodata$int3"))) const int foo3 = 3;
__attribute__((section(".rodata$int4"))) const int fooE = 0;
...
int x = ret(*(&fooS + i));

this access is only ever valid for i == 0 as otherwise you are creating
a pointer that points outside of the object fooS.

Richard.


> Thanks,
> Jiri Hruska


Re: [PATCH, testsuite]: Fix scan-tree-dump-times argument order in gcc.dg/tree-ssa/vrp68.c.

2012-06-19 Thread Richard Guenther
On Mon, Jun 18, 2012 at 10:01 PM, Janis Johnson
 wrote:
> On 06/17/2012 05:03 AM, Richard Guenther wrote:
>> On Sun, Jun 17, 2012 at 10:41 AM, Uros Bizjak  wrote:
>>> Hello!
>>>
>>> The testcase still fails on x86_64-pc-linux-gnu with:
>>>
>>> FAIL: gcc.dg/tree-ssa/vrp68.c scan-tree-dump-times vrp1 "link_error" 1
>>>
>>> since there are two calls to link_error.
>>
>> Oops.  I wonder how I did not see those failures myself ...
>>
>> Richard.
>
> I'm confused about what this test is supposed to do.  It uses
> "dg-do link" which means the compile (test for excess errors) will
> fail if there is a reference to link_error.  There are two uses of
> scan-tree-dump-times for the same string in the same file, so one
> of those is guaranteed to fail.  It looks like the scans aren't
> needed, and "dg-do link" is the thing that needs the xfail.

No, the scan-tree-dump-times are supposed to catch that already
VRP1 has done the optimization - it does not so fully, which is
why I added the XFAILed scan-tree-dump-times.  But we still catch
that XFAILed case with subsequent optimizations so the link succeeds
nevertheless.

The testcase fails now, I must have broken the optimization somehow
and I am looking into it.

Richard.

> Janis


[patch] Fix failing nested-3.C on ARM.

2012-06-19 Thread Richard Earnshaw
The regexp in nested-3.C has to parse the machine-specific comment
character; on ARM that is '@'.

Tested on arm-eabi, where this test now passes.

OK?

R.

* g++.dg/debug/dwarf2/nested-3.C: Add ARM comment character to regexp.
--- g++.dg/debug/dwarf2/nested-3.C  (revision 188750)
+++ g++.dg/debug/dwarf2/nested-3.C  (local)
@@ -59,4 +59,4 @@ main ()
 //
 // Hence the scary regexp:
 //
-// { dg-final { scan-assembler "\[^\n\r\]*\\(DIE \\(0x(\[0-9a-f\]+)\\) 
DW_TAG_namespace\\)\[\n\r\]+\[^\n\r\]*\"thread\[\^\n\r]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\\(DIE
 \\(0x(\[0-9a-f\]+)\\) 
DW_TAG_class_type\\)(\[\n\r\]+\[^\n\r\]*)+\"Executor\[^\n\r\]+\[\n\r\]+\[^\n\r\]*DW_AT_declaration\[\n\r\]+\[^\n\r\]*DW_AT_signature\[^#/!|\]*\[#/!|\]
 
\[^\n\r\]*\\(DIE\[^\n\r\]*DW_TAG_subprogram\\)\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\"CurrentExecutor\[^\n\r\]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*end
 of children of DIE 0x\\3\[\n\r]+\[^\n\r\]*end of children of DIE 
0x\\1\[\n\r]+" } }
+// { dg-final { scan-assembler "\[^\n\r\]*\\(DIE \\(0x(\[0-9a-f\]+)\\) 
DW_TAG_namespace\\)\[\n\r\]+\[^\n\r\]*\"thread\[\^\n\r]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\\(DIE
 \\(0x(\[0-9a-f\]+)\\) 
DW_TAG_class_type\\)(\[\n\r\]+\[^\n\r\]*)+\"Executor\[^\n\r\]+\[\n\r\]+\[^\n\r\]*DW_AT_declaration\[\n\r\]+\[^\n\r\]*DW_AT_signature\[^#/!|@\]*\[#/!|@\]
 
\[^\n\r\]*\\(DIE\[^\n\r\]*DW_TAG_subprogram\\)\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\"CurrentExecutor\[^\n\r\]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*end
 of children of DIE 0x\\3\[\n\r]+\[^\n\r\]*end of children of DIE 
0x\\1\[\n\r]+" } }

[PATCH] Fix PR53708

2012-06-19 Thread Richard Guenther

We are too eager to bump alignment of some decls when vectorizing.
The fix is to not bump alignment of decls the user explicitely
aligned or that are used in an unknown way.

Bootstrapped and tested on i686-darwin9 and x86_64-apple-darwin10
and powerpc-apple-darwin9 by darwin folks, applied.

Richard.

2012-06-19  Richard Guenther  

PR tree-optimization/53708
* tree-vect-data-refs.c (vect_can_force_dr_alignment_p): Preserve
user-supplied alignment and alignment of decls with the used
attribute.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 188733)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -4731,6 +4720,12 @@ vect_can_force_dr_alignment_p (const_tre
   if (TREE_ASM_WRITTEN (decl))
 return false;
 
+  /* Do not override explicit alignment set by the user or the alignment
+ as specified by the ABI when the used attribute is set.  */
+  if (DECL_USER_ALIGN (decl)
+  || DECL_PRESERVE_P (decl))
+return false;
+
   if (TREE_STATIC (decl))
 return (alignment <= MAX_OFILE_ALIGNMENT);
   else


Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Mon, 18 Jun 2012, William J. Schmidt wrote:

> On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > 
> 
> > 
> > Hmm.  I don't like this patch or its general idea too much.  Instead
> > I'd like us to move more of the cost model detail to the target, giving
> > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > posting the overall idea at some point, but let me repeat it here instead
> > of trying to find that e-mail.
> > 
> > The basic interface of the cost model should be, in targetm.vectorize
> > 
> >   /* Tell the target to start cost analysis of a loop or a basic-block
> >  (if the loop argument is NULL).  Returns an opaque pointer to
> >  target-private data.  */
> >   void *init_cost (struct loop *loop);
> > 
> >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> >   void add_stmt_cost (void *data, unsigned n,
> >   vectorized-stmt-kind,
> >   enum machine_mode vector_mode);
> > 
> >   /* Tell the target to compute and return the cost of the accumulated
> >  statements and free any target-private data.  */
> >   unsigned finish_cost (void *data);
> > 
> > with eventually slightly different signatures for add_stmt_cost
> > (like pass in the original scalar stmt?).
> > 
> > It allows the target, at finish_cost time, to evaluate things like
> > register pressure and resource utilization.
> > 
> > Thanks,
> > Richard.
> 
> I've been looking at this in between other projects.  I wanted to be
> sure I understood the SLP infrastructure and whether it would cause any
> problems.  It looks to me like it will be mostly ok.  One issue I
> noticed is a possible difference in the order in which SLP instructions
> are analyzed and the order in which the instructions are "issued" during
> transformation.
> 
> For both loop analysis and basic block analysis, SLP trees are
> constructed and analyzed prior to examining other vectorizable
> instructions.  Their costs are calculated and stored in the SLP trees at
> this time.  Later, when transforming statements to their vector
> equivalents, instructions in the block (or loop body) are processed in
> order until the first instruction that's part of an SLP tree is
> encountered.  At that point, every instruction that's part of any SLP
> tree is transformed; then the vectorizer continues with the remaining
> non-SLP vectorizable statements.
> 
> So if we do the natural and easy thing of placing calls to add_stmt_cost
> everywhere that costs are calculated today, the order that those costs
> are presented to the back end model will possibly be different than the
> order they are actually "emitted."

Interesting.  But I suppose this is similar to how pattern statements
are handled?  Thus, the whole pattern sequence is processed when
we encounter the "main" pattern statement?

> For a first cut at this, I suggest ignoring the problem other than to
> document it as an opportunity for improvement.  Later we could improve
> it by using an add_stmt_slp_cost () interface (or adding an is_slp
> flag), and another interface to be called at the time during analysis
> when the SLP statements will be issued during transformation.  This
> would allow the back end model to queue up the SLP costs in a separate
> vector and later place them in its internal structures at the
> appropriate place.
>
> It should eventually be possible to remove these fields/accessors:
> 
>  * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
>  * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
>  * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
> 
> However, I think this should be delayed until we have the basic
> infrastructure in place for the new model and well-tested.

Indeed.
 
> The other issue is that we should have the model track both the inside
> and outside costs if we're going to get everything into the target
> model.  For a first pass we can ignore this and keep the existing logic
> for the outside costs.  Later we should add some interfaces analogous to
> add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so
> the model can track this stuff as carefully as it wants to.

Outside costs are merely added to the niter * inner-cost metric to
be compared with the scalar cost niter * scalar-cost, right?  Thus
they would be tracked completely separate - eventually similar to
how we compute the cost of the scalar loop.

> So, I'd propose going at this in several phases:
> 
> (1) Add calls to the new interface without disturbing existing logic;
> modify the profitability algorithms to query the new model for inside
> costs.  Default algorithm for the model is to just sum costs as is done
> today.

Right.

> (x) Add heuristics to target models as desired.
> (2) Handle the SLP ordering problem.
> (3) Handle outside costs in the target model.
> (4) Remove the now unnecessary cost fields and the calls that set them.
> 
> Item (x) can happen anytime after item (1).
> 
>

Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Mon, 18 Jun 2012, William J. Schmidt wrote:

> On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
> > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > 
> > 
> > > 
> > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > I'd like us to move more of the cost model detail to the target, giving
> > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > posting the overall idea at some point, but let me repeat it here instead
> > > of trying to find that e-mail.
> > > 
> > > The basic interface of the cost model should be, in targetm.vectorize
> > > 
> > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > >  target-private data.  */
> > >   void *init_cost (struct loop *loop);
> > > 
> > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > >   void add_stmt_cost (void *data, unsigned n,
> > > vectorized-stmt-kind,
> > >   enum machine_mode vector_mode);
> > > 
> > >   /* Tell the target to compute and return the cost of the accumulated
> > >  statements and free any target-private data.  */
> > >   unsigned finish_cost (void *data);
> 
> By the way, I don't see much point in passing the void *data around
> here.  Too many levels of interfaces that we'd have to pass it around in
> the vectorizer, so it would just sit in a static variable.  Might as
> well let the data be wholly private to the target.

Ok, so you'd have void init_cost (struct loop *) and
unsigned finish_cost (void); then?  Static variables are of couse
not properly "abstracted" so we can't ever compute two set of costs
at the same time ... but that's true all-over-the-place in GCC ...

With previous discussion the add_stmt_cost hook would be split up
to also allow passing the operation code for example.

Richard.


Re: RFA: Fix PR53688

2012-06-19 Thread Michael Matz
Hi,

On Tue, 19 Jun 2012, Richard Guenther wrote:

> > So, we have to build a memref always and rewrite its type to one
> > representing the real size.  Note that TYPE_MAX_VALUE may be NULL, so we
> > don't need to check for 'len' being null or not.
> >
> > This fixes the C testcase (don't know about fma 3d), and is in
> > regstrapping on x86_64-linux.  Okay if that passes?
> 
> Ok.

Thanks, but I now know why we built an INDIRECT_REF :)  
build_simple_mem_ref() only handles some very constrained arguments, 
namely pointers and offseted ADDR_EXPRs when the offset is a constant.  
It doesn't for instance handle &bla->a[i] (it asserts).  So the patch 
trips over the assert in build_simple_mem_ref on "__builtin_memset 
(&p->c[i], 0, 42);".

I could build INDIRECT_REFs always instead of MEM_REFs, this fixes the bug 
too, but it wouldn't generate any MEM_EXPRs anymore, and hence the whole 
bruhaha would be dead code (well, except for alignment setting).

Or I could build MEM_REFs directly, not via build_simple_mem_ref, that 
also works, but leaves us with such MEM_EXPRs sometimes:

  (mem/c:BLK (reg:DI 65) [0 MEM[(void *)&p_1(D)->c[i_2(D)]]+0 A8])

Note the complicated and non-canonical expression in the MEM[].  I'm not 
sure if the disambiguators do anything interesting with such expressions.  
If they aren't we'd safe memory by not generating this MEM_EXPR at all.

If the latter is acceptable, then I indeed can as well wrap everything in 
a MEM_REF like you proposed (possibly with a predicate "simple enough" 
that reflects what build_simple_mem_ref is also checking) and be done with 
it.

So, what should it be?


Ciao,
Michael.

Re: [PATCH] Fix PR53708

2012-06-19 Thread Richard Sandiford
Richard Guenther  writes:
> We are too eager to bump alignment of some decls when vectorizing.
> The fix is to not bump alignment of decls the user explicitely
> aligned or that are used in an unknown way.

I thought attribute((__aligned__)) only set a minimum alignment
for variables?  Most usees I've seen have been trying to get
better performance from higher alignment, so it might not go
down well if the attribute stopped the vectoriser from increasing
the alignment still further.

Richard


Re: [PATCH] Fix PR53708

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, Richard Sandiford wrote:

> Richard Guenther  writes:
> > We are too eager to bump alignment of some decls when vectorizing.
> > The fix is to not bump alignment of decls the user explicitely
> > aligned or that are used in an unknown way.
> 
> I thought attribute((__aligned__)) only set a minimum alignment
> for variables?  Most usees I've seen have been trying to get
> better performance from higher alignment, so it might not go
> down well if the attribute stopped the vectoriser from increasing
> the alignment still further.

That's what the documentation says indeed.  I'm not sure which
part of the patch fixes the ObjC failures where the alignment
is part of the ABI (and I suppose ObjC then mis-uses the aligned
attribute?).

Richard.


Re: [PATCH] Improve pattern recognizer for division by constant (PR tree-optimization/51581)

2012-06-19 Thread Jakub Jelinek
On Mon, Jun 18, 2012 at 04:44:21PM -0700, Richard Henderson wrote:
> On 2012-06-14 13:58, Jakub Jelinek wrote:
> > +  if (!supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt,
> > +  vecwtype, vectype,
> > +  &dummy, &dummy, &dummy_code,
> > +  &dummy_code, &dummy_int, &dummy_vec))
> > +return NULL;
> 
> 
> It would be nice to be able to handle high-part multiplies as well, e.g. 
> VEC_WIDEN_MULT_HI_EXPR.  Which is what Altivec provides, and not 
> VEC_WIDEN_MULT.

Sure, but we don't have a tree code for that right now, do we?
VEC_WIDEN_MULT_HI_EXPR is just one half of the widened multiply results,
not all the high halves of the widened multiply.
For 16-bit multiplication we could also use {,V}PMULH{,U}W
(for 32-bit multiplication we use two {,V}PMUL{,U}DQ plus shifts afterwards).

Jakub


Re: RFA: Fix PR53688

2012-06-19 Thread Richard Guenther
On Tue, Jun 19, 2012 at 12:13 PM, Michael Matz  wrote:
> Hi,
>
> On Tue, 19 Jun 2012, Richard Guenther wrote:
>
>> > So, we have to build a memref always and rewrite its type to one
>> > representing the real size.  Note that TYPE_MAX_VALUE may be NULL, so we
>> > don't need to check for 'len' being null or not.
>> >
>> > This fixes the C testcase (don't know about fma 3d), and is in
>> > regstrapping on x86_64-linux.  Okay if that passes?
>>
>> Ok.
>
> Thanks, but I now know why we built an INDIRECT_REF :)
> build_simple_mem_ref() only handles some very constrained arguments,
> namely pointers and offseted ADDR_EXPRs when the offset is a constant.
> It doesn't for instance handle &bla->a[i] (it asserts).  So the patch
> trips over the assert in build_simple_mem_ref on "__builtin_memset
> (&p->c[i], 0, 42);".
>
> I could build INDIRECT_REFs always instead of MEM_REFs, this fixes the bug
> too, but it wouldn't generate any MEM_EXPRs anymore, and hence the whole
> bruhaha would be dead code (well, except for alignment setting).
>
> Or I could build MEM_REFs directly, not via build_simple_mem_ref, that
> also works, but leaves us with such MEM_EXPRs sometimes:
>
>  (mem/c:BLK (reg:DI 65) [0 MEM[(void *)&p_1(D)->c[i_2(D)]]+0 A8])
>
> Note the complicated and non-canonical expression in the MEM[].  I'm not
> sure if the disambiguators do anything interesting with such expressions.
> If they aren't we'd safe memory by not generating this MEM_EXPR at all.
>
> If the latter is acceptable, then I indeed can as well wrap everything in
> a MEM_REF like you proposed (possibly with a predicate "simple enough"
> that reflects what build_simple_mem_ref is also checking) and be done with
> it.
>
> So, what should it be?

The MEM_REF is acceptable to the tree oracle and it can extract
points-to information from it.

Thus for simplicity unconditionally building the above is the best.

We can always massage both fold to handle more complex cases
(like the POINTER_PLUS_EXPR case) and set_mem_attributes to
canonicalize / strip the above from useless parts.

Thanks,
Richard.

>
> Ciao,
> Michael.


RE: [4.6][ARM] Backport "MCR Not available in Thumb1"

2012-06-19 Thread Joey Ye
Oops! Sorry for such a stupid problem.

2012-06-18  Joey Ye  

Backported from mainline
2011-10-14  David Alan Gilbert  

* config/arm/arm.h (TARGET_HAVE_DMB_MCR): MCR Not available in
Thumb1.

Index: gcc/config/arm/arm.h
===
--- gcc/config/arm/arm.h(revision 188331)
+++ gcc/config/arm/arm.h(working copy)
@@ -294,7 +294,8 @@
 #define TARGET_HAVE_DMB(arm_arch7)
 
 /* Nonzero if this chip implements a memory barrier via CP15.  */
-#define TARGET_HAVE_DMB_MCR(arm_arch6k && ! TARGET_HAVE_DMB)
+#define TARGET_HAVE_DMB_MCR(arm_arch6 && ! TARGET_HAVE_DMB \
+&& ! TARGET_THUMB1)
 
 /* Nonzero if this chip implements a memory barrier instruction.  */
 #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)


> -Original Message-
> From: Richard Earnshaw
> Sent: Tuesday, June 19, 2012 16:43
> To: Joey Ye
> Cc: GCC Patches
> Subject: Re: [4.6][ARM] Backport "MCR Not available in Thumb1"
> 
> On 19/06/12 04:03, Joey Ye wrote:
> > Backporting trunk r179979
> >
> > OK for 4.6?
> >
> > Backported from mainline
> > 2011-10-14  David Alan Gilbert  
> >
> > PR target/48126
> > * config/arm/arm.c (arm_output_sync_loop): Move label before
> > barrier.
> >
> > Index: gcc/config/arm/arm.h
> > ===
> > --- gcc/config/arm/arm.h(revision 188331)
> > +++ gcc/config/arm/arm.h(working copy)
> > @@ -294,7 +294,8 @@
> >  #define TARGET_HAVE_DMB(arm_arch7)
> >
> >  /* Nonzero if this chip implements a memory barrier via CP15.  */
> > -#define TARGET_HAVE_DMB_MCR(arm_arch6k && ! TARGET_HAVE_DMB)
> > +#define TARGET_HAVE_DMB_MCR(arm_arch6 && ! TARGET_HAVE_DMB \
> > +&& ! TARGET_THUMB1)
> >
> >  /* Nonzero if this chip implements a memory barrier instruction.  */
> >  #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB ||
> TARGET_HAVE_DMB_MCR)
> >
> >
> 
> Not ok (yet), the ChangeLog entry doesn't match the patch.
> 
> R.





Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread William J. Schmidt
On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote:
> On Mon, 18 Jun 2012, William J. Schmidt wrote:
> 
> > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > 
> > 
> > > 
> > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > I'd like us to move more of the cost model detail to the target, giving
> > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > posting the overall idea at some point, but let me repeat it here instead
> > > of trying to find that e-mail.
> > > 
> > > The basic interface of the cost model should be, in targetm.vectorize
> > > 
> > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > >  target-private data.  */
> > >   void *init_cost (struct loop *loop);
> > > 
> > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > >   void add_stmt_cost (void *data, unsigned n,
> > > vectorized-stmt-kind,
> > >   enum machine_mode vector_mode);
> > > 
> > >   /* Tell the target to compute and return the cost of the accumulated
> > >  statements and free any target-private data.  */
> > >   unsigned finish_cost (void *data);
> > > 
> > > with eventually slightly different signatures for add_stmt_cost
> > > (like pass in the original scalar stmt?).
> > > 
> > > It allows the target, at finish_cost time, to evaluate things like
> > > register pressure and resource utilization.
> > > 
> > > Thanks,
> > > Richard.
> > 
> > I've been looking at this in between other projects.  I wanted to be
> > sure I understood the SLP infrastructure and whether it would cause any
> > problems.  It looks to me like it will be mostly ok.  One issue I
> > noticed is a possible difference in the order in which SLP instructions
> > are analyzed and the order in which the instructions are "issued" during
> > transformation.
> > 
> > For both loop analysis and basic block analysis, SLP trees are
> > constructed and analyzed prior to examining other vectorizable
> > instructions.  Their costs are calculated and stored in the SLP trees at
> > this time.  Later, when transforming statements to their vector
> > equivalents, instructions in the block (or loop body) are processed in
> > order until the first instruction that's part of an SLP tree is
> > encountered.  At that point, every instruction that's part of any SLP
> > tree is transformed; then the vectorizer continues with the remaining
> > non-SLP vectorizable statements.
> > 
> > So if we do the natural and easy thing of placing calls to add_stmt_cost
> > everywhere that costs are calculated today, the order that those costs
> > are presented to the back end model will possibly be different than the
> > order they are actually "emitted."
> 
> Interesting.  But I suppose this is similar to how pattern statements
> are handled?  Thus, the whole pattern sequence is processed when
> we encounter the "main" pattern statement?

Yes, but the difference is that both vect_analyze_stmt and
vect_transform_loop handle the pattern statements in the same order
(thankfully -- I would hate to have to deal with the pattern mess).
With SLP, all SLP statements are analyzed ahead of time, but they aren't
transformed until one of them is encountered in the statement walk.

> 
> > For a first cut at this, I suggest ignoring the problem other than to
> > document it as an opportunity for improvement.  Later we could improve
> > it by using an add_stmt_slp_cost () interface (or adding an is_slp
> > flag), and another interface to be called at the time during analysis
> > when the SLP statements will be issued during transformation.  This
> > would allow the back end model to queue up the SLP costs in a separate
> > vector and later place them in its internal structures at the
> > appropriate place.
> >
> > It should eventually be possible to remove these fields/accessors:
> > 
> >  * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
> >  * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
> >  * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
> > 
> > However, I think this should be delayed until we have the basic
> > infrastructure in place for the new model and well-tested.
> 
> Indeed.
> 
> > The other issue is that we should have the model track both the inside
> > and outside costs if we're going to get everything into the target
> > model.  For a first pass we can ignore this and keep the existing logic
> > for the outside costs.  Later we should add some interfaces analogous to
> > add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so
> > the model can track this stuff as carefully as it wants to.
> 
> Outside costs are merely added to the niter * inner-cost metric to
> be compared with the scalar cost niter * scalar-cost, right?  Thus
> they would be tracked completely separate - eventually similar to
> how we compute 

Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, William J. Schmidt wrote:

> On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote:
> > On Mon, 18 Jun 2012, William J. Schmidt wrote:
> > 
> > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > > 
> > > 
> > > > 
> > > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > > I'd like us to move more of the cost model detail to the target, giving
> > > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > > posting the overall idea at some point, but let me repeat it here 
> > > > instead
> > > > of trying to find that e-mail.
> > > > 
> > > > The basic interface of the cost model should be, in targetm.vectorize
> > > > 
> > > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > > >  target-private data.  */
> > > >   void *init_cost (struct loop *loop);
> > > > 
> > > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > > >   void add_stmt_cost (void *data, unsigned n,
> > > >   vectorized-stmt-kind,
> > > >   enum machine_mode vector_mode);
> > > > 
> > > >   /* Tell the target to compute and return the cost of the accumulated
> > > >  statements and free any target-private data.  */
> > > >   unsigned finish_cost (void *data);
> > > > 
> > > > with eventually slightly different signatures for add_stmt_cost
> > > > (like pass in the original scalar stmt?).
> > > > 
> > > > It allows the target, at finish_cost time, to evaluate things like
> > > > register pressure and resource utilization.
> > > > 
> > > > Thanks,
> > > > Richard.
> > > 
> > > I've been looking at this in between other projects.  I wanted to be
> > > sure I understood the SLP infrastructure and whether it would cause any
> > > problems.  It looks to me like it will be mostly ok.  One issue I
> > > noticed is a possible difference in the order in which SLP instructions
> > > are analyzed and the order in which the instructions are "issued" during
> > > transformation.
> > > 
> > > For both loop analysis and basic block analysis, SLP trees are
> > > constructed and analyzed prior to examining other vectorizable
> > > instructions.  Their costs are calculated and stored in the SLP trees at
> > > this time.  Later, when transforming statements to their vector
> > > equivalents, instructions in the block (or loop body) are processed in
> > > order until the first instruction that's part of an SLP tree is
> > > encountered.  At that point, every instruction that's part of any SLP
> > > tree is transformed; then the vectorizer continues with the remaining
> > > non-SLP vectorizable statements.
> > > 
> > > So if we do the natural and easy thing of placing calls to add_stmt_cost
> > > everywhere that costs are calculated today, the order that those costs
> > > are presented to the back end model will possibly be different than the
> > > order they are actually "emitted."
> > 
> > Interesting.  But I suppose this is similar to how pattern statements
> > are handled?  Thus, the whole pattern sequence is processed when
> > we encounter the "main" pattern statement?
> 
> Yes, but the difference is that both vect_analyze_stmt and
> vect_transform_loop handle the pattern statements in the same order
> (thankfully -- I would hate to have to deal with the pattern mess).
> With SLP, all SLP statements are analyzed ahead of time, but they aren't
> transformed until one of them is encountered in the statement walk.

Ah, ok.  I suppose we can simply declare that when we register
vectorized stmts with the backend they are in arbitrary oder.
After all this is not supposed to be another machine dependent reorg
phase (to quote David).

> > 
> > > For a first cut at this, I suggest ignoring the problem other than to
> > > document it as an opportunity for improvement.  Later we could improve
> > > it by using an add_stmt_slp_cost () interface (or adding an is_slp
> > > flag), and another interface to be called at the time during analysis
> > > when the SLP statements will be issued during transformation.  This
> > > would allow the back end model to queue up the SLP costs in a separate
> > > vector and later place them in its internal structures at the
> > > appropriate place.
> > >
> > > It should eventually be possible to remove these fields/accessors:
> > > 
> > >  * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
> > >  * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
> > >  * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
> > > 
> > > However, I think this should be delayed until we have the basic
> > > infrastructure in place for the new model and well-tested.
> > 
> > Indeed.
> > 
> > > The other issue is that we should have the model track both the inside
> > > and outside costs if we're going to get everything into the target
> > > model.  For a first pass we can ignore this 

Re: [4.6][ARM] Backport "MCR Not available in Thumb1"

2012-06-19 Thread Richard Earnshaw
On 19/06/12 12:26, Joey Ye wrote:
> Oops! Sorry for such a stupid problem.
> 
> 2012-06-18  Joey Ye  
> 
> Backported from mainline
> 2011-10-14  David Alan Gilbert  
> 
> * config/arm/arm.h (TARGET_HAVE_DMB_MCR): MCR Not available in
> Thumb1.
> 

OK.

R.



Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread William J. Schmidt
On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote:
> On Mon, 18 Jun 2012, William J. Schmidt wrote:
> 
> > On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
> > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > > 
> > > 
> > > > 
> > > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > > I'd like us to move more of the cost model detail to the target, giving
> > > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > > posting the overall idea at some point, but let me repeat it here 
> > > > instead
> > > > of trying to find that e-mail.
> > > > 
> > > > The basic interface of the cost model should be, in targetm.vectorize
> > > > 
> > > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > > >  target-private data.  */
> > > >   void *init_cost (struct loop *loop);
> > > > 
> > > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
> > > >   void add_stmt_cost (void *data, unsigned n,
> > > >   vectorized-stmt-kind,
> > > >   enum machine_mode vector_mode);
> > > > 
> > > >   /* Tell the target to compute and return the cost of the accumulated
> > > >  statements and free any target-private data.  */
> > > >   unsigned finish_cost (void *data);
> > 
> > By the way, I don't see much point in passing the void *data around
> > here.  Too many levels of interfaces that we'd have to pass it around in
> > the vectorizer, so it would just sit in a static variable.  Might as
> > well let the data be wholly private to the target.
> 
> Ok, so you'd have void init_cost (struct loop *) and
> unsigned finish_cost (void); then?  Static variables are of couse
> not properly "abstracted" so we can't ever compute two set of costs
> at the same time ... but that's true all-over-the-place in GCC ...

It's a fair point, and perhaps I'll decide to pass the data pointer
around anyway to keep that option open.  We'll see which looks uglier.

> 
> With previous discussion the add_stmt_cost hook would be split up
> to also allow passing the operation code for example.

I remember having this discussion, and I was looking for it to check on
the details, but I can't seem to find it either in my inbox or in the
archives.  Can you please point me to that again?  Sorry for the bother.

Thanks,
Bill

> 
> Richard.
> 



Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, William J. Schmidt wrote:

> On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote:
> > On Mon, 18 Jun 2012, William J. Schmidt wrote:
> > 
> > > On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
> > > > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > > > > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> > > > > 
> > > > 
> > > > > 
> > > > > Hmm.  I don't like this patch or its general idea too much.  Instead
> > > > > I'd like us to move more of the cost model detail to the target, 
> > > > > giving
> > > > > it a chance to look at the whole loop before deciding on a cost.  ISTR
> > > > > posting the overall idea at some point, but let me repeat it here 
> > > > > instead
> > > > > of trying to find that e-mail.
> > > > > 
> > > > > The basic interface of the cost model should be, in targetm.vectorize
> > > > > 
> > > > >   /* Tell the target to start cost analysis of a loop or a basic-block
> > > > >  (if the loop argument is NULL).  Returns an opaque pointer to
> > > > >  target-private data.  */
> > > > >   void *init_cost (struct loop *loop);
> > > > > 
> > > > >   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  
> > > > > */
> > > > >   void add_stmt_cost (void *data, unsigned n,
> > > > > vectorized-stmt-kind,
> > > > >   enum machine_mode vector_mode);
> > > > > 
> > > > >   /* Tell the target to compute and return the cost of the accumulated
> > > > >  statements and free any target-private data.  */
> > > > >   unsigned finish_cost (void *data);
> > > 
> > > By the way, I don't see much point in passing the void *data around
> > > here.  Too many levels of interfaces that we'd have to pass it around in
> > > the vectorizer, so it would just sit in a static variable.  Might as
> > > well let the data be wholly private to the target.
> > 
> > Ok, so you'd have void init_cost (struct loop *) and
> > unsigned finish_cost (void); then?  Static variables are of couse
> > not properly "abstracted" so we can't ever compute two set of costs
> > at the same time ... but that's true all-over-the-place in GCC ...
> 
> It's a fair point, and perhaps I'll decide to pass the data pointer
> around anyway to keep that option open.  We'll see which looks uglier.
> 
> > 
> > With previous discussion the add_stmt_cost hook would be split up
> > to also allow passing the operation code for example.
> 
> I remember having this discussion, and I was looking for it to check on
> the details, but I can't seem to find it either in my inbox or in the
> archives.  Can you please point me to that again?  Sorry for the bother.

It was in the "Correct cost model for strided loads" thread.

Richard.


Re: [PATCH] Fix PR53708

2012-06-19 Thread Dominique Dhumieres
On Tue, 19 Jun 2012, Richard Guenther wrote:
> 
> > Richard Guenther  writes:
> > > We are too eager to bump alignment of some decls when vectorizing.
> > > The fix is to not bump alignment of decls the user explicitely
> > > aligned or that are used in an unknown way.
> > 
> > I thought attribute((__aligned__)) only set a minimum alignment for
> > variables?  Most usees I've seen have been trying to get better
> > performance from higher alignment, so it might not go down well if the
> > attribute stopped the vectoriser from increasing the alignment still
> > further.
> 
> That's what the documentation says indeed.  I'm not sure which part of
> the patch fixes the ObjC failures where the alignment is part of the ABI
> (and I suppose ObjC then mis-uses the aligned attribute?).

A quick test shows that 

if (DECL_PRESERVE_P (decl))

alone is enough to fix the objc failures, while they are still there if 
one uses only

if (DECL_USER_ALIGN (decl))

Dominique


[PATCH][5/n] VRP and anti-ranges

2012-06-19 Thread Richard Guenther

This adjusts intersect_ranges to match what will become union_ranges
(but in a separate patch).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-06-19  Richard Guenther  

* tree-vrp.c (intersect_ranges): Handle more cases.
(vrp_intersect_ranges): Dump what we intersect and call ...
(vrp_intersect_ranges_1): ... this.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c  (revision 188771)
--- gcc/tree-vrp.c  (working copy)
*** intersect_ranges (enum value_range_type
*** 6781,6789 
  enum value_range_type vr1type,
  tree vr1min, tree vr1max)
  {
/* [] is vr0, () is vr1 in the following classification comments.  */
!   if (operand_less_p (*vr0max, vr1min) == 1
!   || operand_less_p (vr1max, *vr0min) == 1)
  {
/* [ ] ( ) or ( ) [ ]
 If the ranges have an empty intersection, the result of the
--- 6781,6811 
  enum value_range_type vr1type,
  tree vr1min, tree vr1max)
  {
+   bool mineq = operand_equal_p (*vr0min, vr1min, 0);
+   bool maxeq = operand_equal_p (*vr0max, vr1max, 0);
+ 
/* [] is vr0, () is vr1 in the following classification comments.  */
!   if (mineq && maxeq)
! {
!   /* [(  )] */
!   if (*vr0type == vr1type)
!   /* Nothing to do for equal ranges.  */
!   ;
!   else if ((*vr0type == VR_RANGE
!   && vr1type == VR_ANTI_RANGE)
!  || (*vr0type == VR_ANTI_RANGE
!  && vr1type == VR_RANGE))
!   {
! /* For anti-range with range intersection the result is empty.  */
! *vr0type = VR_UNDEFINED;
! *vr0min = NULL_TREE;
! *vr0max = NULL_TREE;
!   }
!   else
!   gcc_unreachable ();
! }
!   else if (operand_less_p (*vr0max, vr1min) == 1
!  || operand_less_p (vr1max, *vr0min) == 1)
  {
/* [ ] ( ) or ( ) [ ]
 If the ranges have an empty intersection, the result of the
*** intersect_ranges (enum value_range_type
*** 6813,6831 
  /* Take VR0.  */
}
  }
!   else if (operand_less_p (vr1max, *vr0max) == 1
!  && operand_less_p (*vr0min, vr1min) == 1)
  {
!   /* [ (  ) ]  */
!   if (*vr0type == VR_RANGE)
{
! /* If the outer is a range choose the inner one.
!???  If the inner is an anti-range this arbitrarily chooses
!the anti-range.  */
  *vr0type = vr1type;
  *vr0min = vr1min;
  *vr0max = vr1max;
}
else if (*vr0type == VR_ANTI_RANGE
   && vr1type == VR_ANTI_RANGE)
/* If both are anti-ranges the result is the outer one.  */
--- 6835,6882 
  /* Take VR0.  */
}
  }
!   else if ((maxeq || operand_less_p (vr1max, *vr0max) == 1)
!  && (mineq || operand_less_p (*vr0min, vr1min) == 1))
  {
!   /* [ (  ) ] or [(  ) ] or [ (  )] */
!   if (*vr0type == VR_RANGE
! && vr1type == VR_RANGE)
{
! /* If both are ranges the result is the inner one.  */
  *vr0type = vr1type;
  *vr0min = vr1min;
  *vr0max = vr1max;
}
+   else if (*vr0type == VR_RANGE
+  && vr1type == VR_ANTI_RANGE)
+   {
+ /* Choose the right gap if the left one is empty.  */
+ if (mineq)
+   {
+ if (TREE_CODE (vr1max) == INTEGER_CST)
+   *vr0min = int_const_binop (PLUS_EXPR, vr1max, integer_one_node);
+ else
+   *vr0min = vr1max;
+   }
+ /* Choose the left gap if the right one is empty.  */
+ else if (maxeq)
+   {
+ if (TREE_CODE (vr1min) == INTEGER_CST)
+   *vr0max = int_const_binop (MINUS_EXPR, vr1min,
+  integer_one_node);
+ else
+   *vr0max = vr1min;
+   }
+ /* Choose the anti-range if the range is effectively varying.  */
+ else if (vrp_val_is_min (*vr0min)
+  && vrp_val_is_max (*vr0max))
+   {
+ *vr0type = vr1type;
+ *vr0min = vr1min;
+ *vr0max = vr1max;
+   }
+ /* Else choose the range.  */
+   }
else if (*vr0type == VR_ANTI_RANGE
   && vr1type == VR_ANTI_RANGE)
/* If both are anti-ranges the result is the outer one.  */
*** intersect_ranges (enum value_range_type
*** 6841,6856 
else
gcc_unreachable ();
  }
!   else if (operand_less_p (*vr0max, vr1max) == 1
!  && operand_less_p (vr1min, *vr0min) == 1)
  {
!   /* ( [  ] )  */
!   if (vr1type == VR_RANGE)
!   /* If the outer is a range, choose the inner one.
!  ???  If the inner is an anti-range this arbitrarily chooses
!  the anti-range.  */
;
else if (*vr0type == V

[arm] Remove obsolete FPA support (7/n): Tidy up attributes

2012-06-19 Thread Richard Earnshaw
This patch cleans up some more of the resulting fall-out from removing
the FPA and maverick co-processors.  In particular it covers:

- Removing the redundant states from the type attributes
- Removing some now redundant UNSPEC values.
- Removing some state from the generic scheduler description that is now
no-longer needed.

Tested on arm-eabi and installed on trunk.

* arm.md (enum unspec): Delete UNSPEC_SIN and UNSPEC_COS.
(attr type): Remove fmul, ffmul, farith, ffarith, float_em
f_fpa_load, f_fpa_store, f_mem_r, r_mem_f.
(attr write_conflict, attr core_cycles): Update.
* arm-generic.md (r_mem_f_wbuf): Delete reservation.

R.Index: config/arm/arm.md
===
--- config/arm/arm.md   (revision 188771)
+++ config/arm/arm.md   (working copy)
@@ -65,12 +65,6 @@ (define_constants
 ;; Unspec enumerators for iwmmxt2 are defined in iwmmxt2.md
 
 (define_c_enum "unspec" [
-  UNSPEC_SIN; `sin' operation (MODE_FLOAT):
-;   operand 0 is the result,
-;   operand 1 the parameter.
-  UNPSEC_COS; `cos' operation (MODE_FLOAT):
-;   operand 0 is the result,
-;   operand 1 the parameter.
   UNSPEC_PUSH_MULT  ; `push multiple' operation:
 ;   operand 0 is the first register,
 ;   subsequent registers are in parallel (use ...)
@@ -321,21 +315,11 @@ (define_attr "insn"
 ; floata floating point arithmetic operation (subject to 
expansion)
 ; fdivdDFmode floating point division
 ; fdivsSFmode floating point division
-; fmul Floating point multiply
-; ffmulFast floating point multiply
-; farith   Floating point arithmetic (4 cycle)
-; ffarith  Fast floating point arithmetic (2 cycle)
-; float_em a floating point arithmetic operation that is normally emulated
-;  even on a machine with an fpa.
-; f_fpa_load   a floating point load from memory. Only for the FPA.
-; f_fpa_store  a floating point store to memory. Only for the FPA.
 ; f_load[sd]   A single/double load from memory. Used for VFP unit.
 ; f_store[sd]  A single/double store to memory. Used for VFP unit.
 ; f_flag   a transfer of co-processor flags to the CPSR
-; f_mem_r  a transfer of a floating point register to a real reg via mem
-; r_mem_f  the reverse of f_mem_r
-; f_2_rfast transfer float to arm (no memory needed)
-; r_2_ffast transfer arm to float
+; f_2_rtransfer float to core (no memory needed)
+; r_2_ftransfer core to float
 ; f_cvtconvert floating<->integral
 ; branch   a branch
 ; call a subroutine call
@@ -351,18 +335,59 @@ (define_attr "insn"
 ;
 
 (define_attr "type"
-   
"alu,alu_shift,alu_shift_reg,mult,block,float,fdivx,fdivd,fdivs,fmul,fmuls,fmuld,fmacs,fmacd,ffmul,farith,ffarith,f_flag,float_em,f_fpa_load,f_fpa_store,f_loads,f_loadd,f_stores,f_stored,f_mem_r,r_mem_f,f_2_r,r_2_f,f_cvt,branch,call,load_byte,load1,load2,load3,load4,store1,store2,store3,store4,fconsts,fconstd,fadds,faddd,ffariths,ffarithd,fcmps,fcmpd,fcpys"
-   (if_then_else 
-(eq_attr "insn" 
"smulxy,smlaxy,smlalxy,smulwy,smlawx,mul,muls,mla,mlas,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals")
-(const_string "mult")
-(const_string "alu")))
+ "alu,\
+  alu_shift,\
+  alu_shift_reg,\
+  mult,\
+  block,\
+  float,\
+  fdivd,\
+  fdivs,\
+  fmuls,\
+  fmuld,\
+  fmacs,\
+  fmacd,\
+  f_flag,\
+  f_loads,\
+  f_loadd,\
+  f_stores,\
+  f_stored,\
+  f_2_r,\
+  r_2_f,\
+  f_cvt,\
+  branch,\
+  call,\
+  load_byte,\
+  load1,\
+  load2,\
+  load3,\
+  load4,\
+  store1,\
+  store2,\
+  store3,\
+  store4,\
+  fconsts,\
+  fconstd,\
+  fadds,\
+  faddd,\
+  ffariths,\
+  ffarithd,\
+  fcmps,\
+  fcmpd,\
+  fcpys"
+ (if_then_else 
+(eq_attr "insn" "smulxy,smlaxy,smlalxy,smulwy,smlawx,mul,muls,mla,mlas,\
+umull,umulls,umlal,umlals,smull,smulls,smlal,smlals")
+(const_string "mult")
+(const_string "alu")))
 
 ; Is this an (integer side) multiply with a 64-bit result?
 (define_attr "mul64" "no,yes"
-(if_then_else
-  (eq_attr "insn" 
"smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals")
-  (const_string "yes")
-  (const_string "no")))
+  (if_then_else
+(eq_attr "insn"
+ "smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals")
+(const_string "yes")
+(const_string "no")))
 
 ; wtype for WMMX insn scheduling purposes.
 (define_attr "wtype"
@@ -486,7 +511,7 @@ (define_attr "model_wbuf" "no,yes" (cons
 ; to stall the processor.  Used with model_wbuf above.
 (define_attr "write_conflict" "no,yes"
   (if_then_else (eq_attr "type"
-
"block,float_em,f_fpa_load,f_fpa_store,f_mem_r,r_me

[PATCH][AARCH64]: Invent new regclass - FP low regs.

2012-06-19 Thread Tejas Belagod


Hi,

The attached patch invents a new register class V0 - V15 that is needed for some
lane variants of AdvSIMD instructions that can only take V0 - V15 as their 
indexed register when working on half-word type.


Regression tests are happy. OK?

Thanks,
Tejas Belagod.
ARM.

Changelog:

2012-06-19  Tejas Belagod  

gcc/
* config/aarch64/aarch64-simd.md (aarch64_sqdmulh_lane,
aarch64_sqdmll_lane_internal,
aarch64_sqdmlal_lane, aarch64_sqdmlal_laneq,
aarch64_sqdmlsl_lane, aarch64_sqdmlsl_laneq,
aarch64_sqdmll2_lane_internal,
aarch64_sqdmlal2_lane, aarch64_sqdmlal2_laneq,
aarch64_sqdmlsl2_lane, aarch64_sqdmlsl2_laneq,
aarch64_sqdmull_lane_internal, aarch64_sqdmull_lane,
aarch64_sqdmull_laneq, aarch64_sqdmull2_lane_internal,
aarch64_sqdmull2_lane, aarch64_sqdmull2_laneq): Change the
constraint of the indexed operand to use  instead of w.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs): Add case for
FP_LO_REGS class.
(aarch64_regno_regclass): Return FP_LO_REGS if register in V0 - V15.
(aarch64_secondary_reload): Change condition to check for both FP reg
classes.
(aarch64_class_max_nregs): Add case for FP_LO_REGS.
* config/aarch64/aarch64.h (reg_class): New register class FP_LO_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(FP_LO_REGNUM_P): New.
* config/aarch64/aarch64.md (V15_REGNUM): New.
* config/aarch64/constraints.md (x): New register constraint.
* config/aarch64/iterators.md (vwx): New.diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 9ceefee..43017df 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1897,7 +1897,7 @@
 (unspec:VSDQ_HSI
  [(match_operand:VSDQ_HSI 1 "register_operand" "w")
(vec_select:
- (match_operand: 2 "register_operand" "w")
+ (match_operand: 2 "register_operand" "")
  (parallel [(match_operand:SI 3 "immediate_operand" "i")]))]
 VQDMULH))]
   "TARGET_SIMD"
@@ -1940,7 +1940,7 @@
  (sign_extend:
(vec_duplicate:VD_HSI
  (vec_select:
-   (match_operand: 3 "register_operand" "w")
+   (match_operand: 3 "register_operand" "")
(parallel [(match_operand:SI 4 "immediate_operand" "i")])))
   ))
(const_int 1]
@@ -1960,7 +1960,7 @@
(match_operand:SD_HSI 2 "register_operand" "w"))
  (sign_extend:
(vec_select:
- (match_operand: 3 "register_operand" "w")
+ (match_operand: 3 "register_operand" "")
  (parallel [(match_operand:SI 4 "immediate_operand" "i")])))
   )
(const_int 1]
@@ -1974,7 +1974,7 @@
   [(match_operand: 0 "register_operand" "=w")
(match_operand: 1 "register_operand" "0")
(match_operand:VSD_HSI 2 "register_operand" "w")
-   (match_operand: 3 "register_operand" "w")
+   (match_operand: 3 "register_operand" "")
(match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
@@ -1989,7 +1989,7 @@
   [(match_operand: 0 "register_operand" "=w")
(match_operand: 1 "register_operand" "0")
(match_operand:VSD_HSI 2 "register_operand" "w")
-   (match_operand: 3 "register_operand" "w")
+   (match_operand: 3 "register_operand" "")
(match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
@@ -2004,7 +2004,7 @@
   [(match_operand: 0 "register_operand" "=w")
(match_operand: 1 "register_operand" "0")
(match_operand:VSD_HSI 2 "register_operand" "w")
-   (match_operand: 3 "register_operand" "w")
+   (match_operand: 3 "register_operand" "")
(match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
@@ -2019,7 +2019,7 @@
   [(match_operand: 0 "register_operand" "=w")
(match_operand: 1 "register_operand" "0")
(match_operand:VSD_HSI 2 "register_operand" "w")
-   (match_operand: 3 "register_operand" "w")
+   (match_operand: 3 "register_operand" "")
(match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
@@ -2114,7 +2114,7 @@
(sign_extend:
   (vec_duplicate:
(vec_select:
- (match_operand: 3 "register_operand" "w")
+ (match_operand: 3 "register_operand" "")
  (parallel [(match_operand:SI 4 "immediate_operand" "i")])

  (const_int 1]
@@ -2128,7 +2128,7 @@
   [(match_operand: 0 "register_operand" "=w")
(match_operand: 1 "register_operand" "w")
(match_operand:VQ_HSI 2 "register_operand" "w")
-   (match_operand: 3 "register_operand" "w")
+   (match_operand: 3 "register_operand" "")
(match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
@@ -2144,7 +2144,7 @@
   [(ma

[PATCH] AIX pthread.h fixincludes

2012-06-19 Thread David Edelsohn
AIX 5.2 pthread.h uses the wrong number of braces for more of the
PTHREAD initializers. This patch extends the earlier patch to fix the
other broken macros.

* inclhack.def (aix_mutex_initializer_1, aix_cond_initializer_1,
aix_rwlock_initializer): New.
* fixincl.x: Regenerate.
* tests/base/pthread.h [AIX_MUTEX_INITIALIZER_1_CHECK,
AIX_COND_INITIALIZER_1_CHECK,
AIX_RWLOCK_INITIALIZER_1_CHECK]: New.

Okay?

Thanks, David
Index: inclhack.def
===
--- inclhack.def(revision 188738)
+++ inclhack.def(working copy)
@@ -397,7 +397,9 @@
 };
 
 /*
- *  pthread.h on AIX defines PTHREAD_ONCE_INIT without enough braces.
+ *  pthread.h on AIX defines PTHREAD_ONCE_INIT, PTHREAD_MUTEX_INITIALIZER,
+ *  PTHREAD_COND_INITIALIZER and PTHREAD_RWLOCK_INITIALIZER without enough
+ *  braces.
  */
 fix = {
 hackname  = aix_once_init_1;
@@ -425,6 +427,45 @@
"}\n";
 };
 
+fix = {
+hackname  = aix_mutex_initializer_1;
+mach  = "*-*-aix*";
+files = "pthread.h";
+select= "#define[ \t]PTHREAD_MUTEX_INITIALIZER \n"
+   "\\{ \n";
+c_fix = format;
+c_fix_arg = "#define PTHREAD_MUTEX_INITIALIZER \\\n"
+   "{{ \\\n";
+test_text = "#define PTHREAD_MUTEX_INITIALIZER \n"
+   "{ \n";
+};
+
+fix = {
+hackname  = aix_cond_initializer_1;
+mach  = "*-*-aix*";
+files = "pthread.h";
+select= "#define[ \t]PTHREAD_COND_INITIALIZER \n"
+   "\\{ \n";
+c_fix = format;
+c_fix_arg = "#define PTHREAD_COND_INITIALIZER \\\n"
+   "{{ \\\n";
+test_text = "#define PTHREAD_COND_INITIALIZER \n"
+   "{ \n";
+};
+
+fix = {
+hackname  = aix_rwlock_initializer_1;
+mach  = "*-*-aix*";
+files = "pthread.h";
+select= "#define[ \t]PTHREAD_RWLOCK_INITIALIZER \n"
+   "\\{ \n";
+c_fix = format;
+c_fix_arg = "#define PTHREAD_RWLOCK_INITIALIZER \\\n"
+   "{{ \\\n";
+test_text = "#define PTHREAD_RWLOCK_INITIALIZER \n"
+   "{ \n";
+};
+
 /*
  *  pthread.h on AIX 4.3.3 tries to define a macro without whitspace
  *  which violates a requirement of ISO C.


Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread William J. Schmidt
On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote:
> On Tue, 19 Jun 2012, William J. Schmidt wrote:
> 
> > I remember having this discussion, and I was looking for it to check on
> > the details, but I can't seem to find it either in my inbox or in the
> > archives.  Can you please point me to that again?  Sorry for the bother.
> 
> It was in the "Correct cost model for strided loads" thread.

Ah, right, thanks.  I think it will be best to make that a separate
patch in the series.  Like so:

(1) Add calls to the new interface without disturbing existing logic;
modify the profitability algorithms to query the new model for inside
costs.  Default algorithm for the model is to just sum costs as is done
today.
(1a) Split up the cost hooks (one for loads/stores with misalign parm,
one for vector_stmt with tree_code, etc.).
(x) Add heuristics to target models as desired.
(2) Handle the SLP ordering problem.
(3) Handle outside costs in the target model.
(4) Remove the now unnecessary cost fields and the calls that set them.

I'll start work on this series of patches as I have time between other
projects.

Thanks,
Bill

> 
> Richard.
> 



Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, William J. Schmidt wrote:

> On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote:
> > On Tue, 19 Jun 2012, William J. Schmidt wrote:
> > 
> > > I remember having this discussion, and I was looking for it to check on
> > > the details, but I can't seem to find it either in my inbox or in the
> > > archives.  Can you please point me to that again?  Sorry for the bother.
> > 
> > It was in the "Correct cost model for strided loads" thread.
> 
> Ah, right, thanks.  I think it will be best to make that a separate
> patch in the series.  Like so:
> 
> (1) Add calls to the new interface without disturbing existing logic;
> modify the profitability algorithms to query the new model for inside
> costs.  Default algorithm for the model is to just sum costs as is done
> today.
> (1a) Split up the cost hooks (one for loads/stores with misalign parm,
> one for vector_stmt with tree_code, etc.).
> (x) Add heuristics to target models as desired.
> (2) Handle the SLP ordering problem.
> (3) Handle outside costs in the target model.
> (4) Remove the now unnecessary cost fields and the calls that set them.
> 
> I'll start work on this series of patches as I have time between other
> projects.

Thanks!
Richard.


Re: [PATCH] AIX pthread.h fixincludes

2012-06-19 Thread Bruce Korb
Hi David,

On Tue, Jun 19, 2012 at 7:16 AM, David Edelsohn  wrote:
> Okay?

Okay.

Cheers - Bruce


[testsuite] Clear hwcap_2 with Sun ld

2012-06-19 Thread Rainer Orth
In recent Solaris 11 Update 1 builds, the Sun assembler tags AVX2 object
files with a hardware capability that isn't cleared by the current
gcc/testsuite/gcc.target/i386/clearcap.map file.  There are some new
capabilities in  in AT_SUN_CAP_HW2, but unfortunately
the old linker map syntax has no support for setting/clearing hwcap_2,
and won't ever get it.

To deal with this situation, I've introduced a new mapfile using the v2
syntax which does support clearing hwcap_2, but now I need to determine
if the linker supports that syntax before using it.  Solaris 11 ld has
the necessary support, and it was backported to Solaris 10 Update 10.
Older Solaris 10 updates and Solaris 8/9 lack it, though.

The following patch does just that.  Tested with the appropriate runtest
invocation on i386-pc-solaris2.11 (ld v2 support), i386-pc-solaris2.9
(ld v1 support only), and x86_64-unknown-linux-gnu (GNU ld which doesn't
support either syntax).

Unless someone finds fault with the patch, I'll commit it in a day.

Rainer


2012-06-19  Rainer Orth  

* gcc.target/i386/clearcapv2.map: New file.
* gcc.target/i386/i386.exp: Try it first before clearcap.map.

# HG changeset patch
# Parent 02789d700fe014df8358c45b8dc09a6b104fbb6b
Clear hwcap_2 with Sun ld

diff --git a/gcc/testsuite/gcc.target/i386/clearcapv2.map b/gcc/testsuite/gcc.target/i386/clearcapv2.map
new file mode 100644
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/clearcapv2.map
@@ -0,0 +1,7 @@
+# clear all hardware capabilities emitted by Sun as: the tests here
+# guard against execution at runtime
+# uses mapfile v2 syntax which is the only way to clear AT_SUN_CAP_HW2 flags
+$mapfile_version 2
+CAPABILITY {
+  HW = ;
+};
diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp
--- a/gcc/testsuite/gcc.target/i386/i386.exp
+++ b/gcc/testsuite/gcc.target/i386/i386.exp
@@ -256,12 +256,23 @@ proc check_effective_target_rtm { } {
 
 # If the linker used understands -M , pass it to clear hardware
 # capabilities set by the Sun assembler.
-set clearcap_ldflags "-Wl,-M,$srcdir/$subdir/clearcap.map"
+# Try mapfile syntax v2 first which is the only way to clear hwcap_2 flags.
+set clearcap_ldflags "-Wl,-M,$srcdir/$subdir/clearcapv2.map"
 
-if [check_no_compiler_messages mapfile executable {
+if ![check_no_compiler_messages mapfilev2 executable {
+int main (void) { return 0; }
+} $clearcap_ldflags ] {
+# If this doesn't work, fall back to the less capable v1 syntax.
+set clearcap_ldflags "-Wl,-M,$srcdir/$subdir/clearcap.map"
+
+if ![check_no_compiler_messages mapfile executable {
 	int main (void) { return 0; }
-  } $clearcap_ldflags ] {
+} $clearcap_ldflags ] {
+	unset clearcap_ldflags
+}
+}
 
+if [info exists clearcap_ldflags] {
   if { [info procs gcc_target_compile] != [list] \
 	&& [info procs saved_gcc_target_compile] == [list] } {
 rename gcc_target_compile saved_gcc_target_compile

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH][7/n] VRP and anti-ranges

2012-06-19 Thread Richard Guenther

And here is the union_ranges part.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-06-19  Richard Guenther  

* tree-vrp.c (union_ranges): New function.
(vrp_meet_1): Use union_ranges.
(vrp_meet): Dump what we union and call vrp_meet_1.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c.orig 2012-06-19 15:18:34.0 +0200
--- gcc/tree-vrp.c  2012-06-19 15:23:20.803752745 +0200
*** vrp_visit_stmt (gimple stmt, edge *taken
*** 6770,6775 
--- 6770,7032 
return SSA_PROP_VARYING;
  }
  
+ /* Union the two value-ranges { *VR0TYPE, *VR0MIN, *VR0MAX } and
+{ VR1TYPE, VR0MIN, VR0MAX } and store the result
+in { *VR0TYPE, *VR0MIN, *VR0MAX }.  This may not be the smallest
+possible such range.  The resulting range is not canonicalized.  */
+ 
+ static void
+ union_ranges (enum value_range_type *vr0type,
+ tree *vr0min, tree *vr0max,
+ enum value_range_type vr1type,
+ tree vr1min, tree vr1max)
+ {
+   bool mineq = operand_equal_p (*vr0min, vr1min, 0);
+   bool maxeq = operand_equal_p (*vr0max, vr1max, 0);
+ 
+   /* [] is vr0, () is vr1 in the following classification comments.  */
+   if (mineq && maxeq)
+ {
+   /* [(  )] */
+   if (*vr0type == vr1type)
+   /* Nothing to do for equal ranges.  */
+   ;
+   else if ((*vr0type == VR_RANGE
+   && vr1type == VR_ANTI_RANGE)
+  || (*vr0type == VR_ANTI_RANGE
+  && vr1type == VR_RANGE))
+   {
+ /* For anti-range with range union the result is varying.  */
+ goto give_up;
+   }
+   else
+   gcc_unreachable ();
+ }
+   else if (operand_less_p (*vr0max, vr1min) == 1
+  || operand_less_p (vr1max, *vr0min) == 1)
+ {
+   /* [ ] ( ) or ( ) [ ]
+If the ranges have an empty intersection, result of the union
+operation is the anti-range or if both are anti-ranges
+it covers all.  */
+   if (*vr0type == VR_ANTI_RANGE
+ && vr1type == VR_ANTI_RANGE)
+   goto give_up;
+   else if (*vr0type == VR_ANTI_RANGE
+  && vr1type == VR_RANGE)
+   ;
+   else if (*vr0type == VR_RANGE
+  && vr1type == VR_ANTI_RANGE)
+   {
+ *vr0type = vr1type;
+ *vr0min = vr1min;
+ *vr0max = vr1max;
+   }
+   else if (*vr0type == VR_RANGE
+  && vr1type == VR_RANGE)
+   {
+ /* The result is the convex hull of both ranges.  */
+ if (operand_less_p (*vr0max, vr1min) == 1)
+   {
+ /* If the result can be an anti-range, create one.  */
+ if (TREE_CODE (*vr0max) == INTEGER_CST
+ && TREE_CODE (vr1min) == INTEGER_CST
+ && vrp_val_is_min (*vr0min)
+ && vrp_val_is_max (vr1max))
+   {
+ tree min = int_const_binop (PLUS_EXPR,
+ *vr0max, integer_one_node);
+ tree max = int_const_binop (MINUS_EXPR,
+ vr1min, integer_one_node);
+ if (!operand_less_p (max, min))
+   {
+ *vr0type = VR_ANTI_RANGE;
+ *vr0min = min;
+ *vr0max = max;
+   }
+ else
+   *vr0max = vr1max;
+   }
+ else
+   *vr0max = vr1max;
+   }
+ else
+   {
+ /* If the result can be an anti-range, create one.  */
+ if (TREE_CODE (vr1max) == INTEGER_CST
+ && TREE_CODE (*vr0min) == INTEGER_CST
+ && vrp_val_is_min (vr1min)
+ && vrp_val_is_max (*vr0max))
+   {
+ tree min = int_const_binop (PLUS_EXPR,
+ vr1max, integer_one_node);
+ tree max = int_const_binop (MINUS_EXPR,
+ *vr0min, integer_one_node);
+ if (!operand_less_p (max, min))
+   {
+ *vr0type = VR_ANTI_RANGE;
+ *vr0min = min;
+ *vr0max = max;
+   }
+ else
+   *vr0min = vr1min;
+   }
+ else
+   *vr0min = vr1min;
+   }
+   }
+   else
+   gcc_unreachable ();
+ }
+   else if ((maxeq || operand_less_p (vr1max, *vr0max) == 1)
+  && (mineq || operand_less_p (*vr0min, vr1min) == 1))
+ {
+   /* [ (  ) ] or [(  ) ] or [ (  )] */
+   if (*vr0type == VR_RANGE
+ && vr1type == VR_RANGE)
+   ;
+   else if (*vr0type == VR_ANTI_RANGE
+  && vr1type == VR_ANTI_RANGE)
+   {
+ *vr0type = vr1type;
+ *vr0min = vr1min;
+ *vr0

Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

> These series of patches are for the D compiler frontend for inclusion into 
> GCC.
> 
> http://www.gdcproject.org/files/gdc_frontend.patch.gz
> http://www.gdcproject.org/files/gdc_libphobos.patch.gz
> http://www.gdcproject.org/files/gdc_testsuite.patch.gz
> http://www.gdcproject.org/files/gdc_gcc.patch.gz

Please provide GNU ChangeLog entries for each patch, for each relevant 
ChangeLog file.  It would be best to post those in plain text to the list, 
even if the patches themselves are too big.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][AARCH64]: Invent new regclass - FP low regs.

2012-06-19 Thread Marcus Shawcroft

On 19/06/12 15:03, Tejas Belagod wrote:


Hi,

The attached patch invents a new register class V0 - V15 that is needed for some
lane variants of AdvSIMD instructions that can only take V0 - V15 as their
indexed register when working on half-word type.

Regression tests are happy. OK?


OK
/Marcus



Re: [PATCH 4/4] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

> --- gcc-4.8-20120617/gcc/doc/install.texi 2012-05-29 15:14:06.0 
> +0100
> +++ gcc-4.8/gcc/doc/install.texi  2012-06-18 20:39:45.058591380 +0100
> @@ -1360,12 +1360,12 @@ their runtime libraries should be built.
>  grep language= */config-lang.in
>  @end smallexample
>  Currently, you can use any of the following:
> -@code{all}, @code{ada}, @code{c}, @code{c++}, @code{fortran},
> +@code{all}, @code{ada}, @code{c}, @code{c++}, @code{d}, @code{fortran},
>  @code{go}, @code{java}, @code{objc}, @code{obj-c++}.
>  Building the Ada compiler has special requirements, see below.
>  If you do not pass this flag, or specify the option @code{all}, then all
>  default languages available in the @file{gcc} sub-tree will be configured.
> -Ada, Go and Objective-C++ are not default languages; the rest are.
> +Ada, D, Go and Objective-C++ are not default languages; the rest are.

Maybe this should be true, but I don't see a build_by_default=no setting 
in config-lang.in (in gdc_frontend.patch.gz) to make it so.

> --- gcc-4.8-20120617/gcc/doc/standards.texi   2011-12-21 17:53:58.0 
> +
> +++ gcc-4.8/gcc/doc/standards.texi2012-04-22 17:11:38.553880036 +0100
> @@ -289,6 +289,16 @@ a specific version.  In general GCC trac
>  closely, and any given release will support the language as of the
>  date that the release was frozen.
>  
> +@section D language
> +
> +The D language continues to evolve as of this writing; see the
> +@uref{http://golang.org/@/doc/@/go_spec.html, current language
> +specifications}.  At present there are no specific versions of Go, and
> +there is no way to describe the language supported by GCC in terms of
> +a specific version.  In general GCC tracks the evolving specification
> +closely, and any given release will support the language as of the
> +date that the release was frozen.

Referring to Go in a section about D doesn't make sense

I don't see entries in contrib.texi in this patch.

I'd also expect contrib/gcc_update to be updated to handle timestamp 
ordering for generated files in libphobos.

Are you volunteering to be appointed maintainer for this front end by the 
SC?

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Fix vrp68 testcase

2012-06-19 Thread Richard Guenther

This fixes the testcase to match reality - and update the comments
appropriately in it.

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-06-19  Richard Guenther  

* gcc.dg/tree-ssa/vrp68.c: Adjust testcase.

Index: gcc/testsuite/gcc.dg/tree-ssa/vrp68.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/vrp68.c   (revision 188780)
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp68.c   (working copy)
@@ -8,17 +8,11 @@ void test1 (int i, int j, int b)
   RANGE(i, 2, 6);
   ANTI_RANGE(j, 1, 7);
   MERGE(b, i, j);
-  CHECK_ANTI_RANGE(i, 7, 7);
   CHECK_ANTI_RANGE(i, 1, 1);
-  /* If we swap the anti-range tests the ~[6, 6] test is never eliminated.  */
 }
 int main() { }
 
-/* While subsequent VRP/DOM passes manage to even recognize the ~[6, 6]
-   test as redundant a single VRP run will arbitrarily choose ~[0, 0] when
-   merging [1, 5] with ~[0, 6] so the first VRP pass can only eliminate
-   the ~[0, 0] check as redundant.  */
+/* VRP will arbitrarily choose ~[1, 1] when merging [2, 6] with ~[1, 7].  */
 
-/* { dg-final { scan-tree-dump-times "link_error" 0 "vrp1" { xfail *-*-* } } } 
*/
-/* { dg-final { scan-tree-dump-times "link_error" 1 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "link_error" 0 "vrp1" } } */
 /* { dg-final { cleanup-tree-dump "vrp1" } } */


Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-19 Thread Richard Henderson
On 2012-06-18 23:38, Uros Bizjak wrote:
> On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson  wrote:
>> On 2012-06-18 13:19, Uros Bizjak wrote:
>>>/* ??? The builtin doesn't understand that the PCMPESTRI read from
>>>memory need not be aligned.  */
>>> -  __asm ("%vpcmpestri $0, (%1), %2"
>>> -  : "=c"(index) : "r"(s), "x"(search), "a"(4), "d"(16));
>>> +  sv = __builtin_ia32_loaddqu ((const char *) s);
>>> +  index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
>>> +
>>
>>
>> Surely the comment can be removed too then?
> 
> I'm not sure there. The builtin, as defined, expects V16QI operand
> with xm constraint.

Fair enough.  I'm ok with the patch as-is.


r~




Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Steven Bosscher
Hello,

I had a very quick look through the gdc_frontend patch. Below are a
couple of comments on it:

> http://www.gdcproject.org/files/gdc_frontend.patch.gz
>
> [PATCH 1/4]:
> The D compiler frontend
>  -  gcc/d

How did you test this? You include rtl.h/expr.h in d-builtins.c and
d-gcc-includes.h, which should both be in ALL_HOST_FRONTEND_OBJS and
fail to build because IN_GCC_FRONTEND is defined and GCC_RTL_H is
poisoned. See system.h:

/* Front ends should never have to include middle-end headers.  Enforce
   this by poisoning the header double-include protection defines.  */
#ifdef IN_GCC_FRONTEND
#pragma GCC poison GCC_RTL_H GCC_EXCEPT_H GCC_EXPR_H
#endif

Do you somehow bypass the normal build system? Or maybe you don't
include system.h? Either way, front ends should never have to include
RTL headers.

BTW you also include output.h in those two files, and I am about two
patches away from adding output.h to the list of headers that no front
end should ever include (a front end should never have to write out
assembly). Can you please check what you need output.h for, and fix
this?


What are you calling targetm.asm_out.output_mi_thunk and
targetm.asm_out.generate_internal_label for? Thunks and aliases should
go through cgraphunit.

(NB: This also means that this front end cannot work with LTO. IMHO we
shouldn't let in new front ends that don't work with LTO.)


Many functions have no leading comment, and other GNU coding standard
requirements are not followed either. Those should IMHO be fixed also,
before this front end can be accepted.


There is this comment:
+/* GCC does not support jumps from asm statements.

This isn't really true anymore, as your patch also notes:
+   --
+   %% Fix for GCC-4.5+
+   GCC now accepts a 5th operand, ASM_LABELS.
(...)
+   For prior versions of gcc, this requires a backpatch.

It seems to me that if this front end is contributed, handling of
prior version of gcc isn't necessary anymore - that code should just
be removed.


+
+   case Op_de:
+#ifndef TARGET_80387
+#define XFmode TFmode
+#endif
+ mode = XFmode; // not TFmode

What is this hack for? This is not the way to find the right mode for
this operation.

+#ifdef TARGET_80387
+#include "d-asm-i386.h"
+#else
+#define D_NO_INLINE_ASM_AT_ALL
+#endif
+
+/* Apple GCC extends ASM_EXPR to five operands; cannot use build4. */

Idem here. And Apple GCC is irrelevant too, if this front end lands on
FSF trunk.

What is d/d-asm-i386.h for? It looks like i386 is a special case
throughout the front end.


In d-gcc-tree.h:
+// normally include config.h (hconfig.h, tconfig.h?), but that
+// includes things that cause problems, so...
+
+union tree_node;
+typedef union tree_node *tree;

See coretypes.h.

Ciao!
Steven


Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

> [PATCH 1/4]:
> The D compiler frontend
>  -  gcc/d

Only selectively reviewed, but here are some comments:

> diff -Naur gcc-4.8-20120617/gcc/d/asmstmt.cc gcc-4.8/gcc/d/asmstmt.cc
> --- gcc-4.8-20120617/gcc/d/asmstmt.cc   1970-01-01 01:00:00.0 +0100
> +++ gcc-4.8/gcc/d/asmstmt.cc2012-06-05 13:42:09.044876794 +0100
> @@ -0,0 +1,2731 @@
> +// asmstmt.cc -- D frontend for GCC.
> +// Originally contributed by David Friedman
> +// Maintained by Iain Buclaw
> +
> +// GCC is free software; you can redistribute it and/or modify it under

Every file more than ten lines long needs a copyright notice as well as 
the license notice.  See 
 for 
instructions, including the case of multiple copyright holders - though if 
there are any significant (more than fifteen lines of copyrightable text 
or so) contributors not assigning copyright to the FSF then special 
approval from the FSF will be needed to include the front end.

I would say that the files in dfrontend/ need copyright and license 
notices as well, though not necessarily in exactly GNU form.  Thus, you 
will need to get Digital Mars to approve appropriate notices for those 
files (aav.c is the first I see that's lacking such a notice but is long 
enough to need one; likewise async.c, gnuc.c, speller.c; rmem.c just says 
"All Rights Reserved" and needs a proper license notice like other files; 
likewise rmem.h).

> +#ifdef TARGET_80387
> +#include "d-asm-i386.h"
> +#else
> +#define D_NO_INLINE_ASM_AT_ALL
> +#endif

Ugh.  We want to move away from target macros, and this isn't even a 
proper target macro.  It would be better to define target hooks for the D 
inline asm support - possibly with a D-specific hook structure, like the C 
hooks structure.  (Even if you avoid needing copyright assignments for the 
front end itself, such hook implementations will probably need to be 
assigned.)

> +/* Apple GCC extends ASM_EXPR to five operands; cannot use build4. */

I don't see why that should be in the least relevant to a contribution to 
FSF GCC.  If you can do things in a more natural way in FSF GCC, then do 
so.

Each function in the GCC-specific parts of the code should have a comment 
on it, explaining the semantics of the function, its operands and its 
return value if any.

For new code in GCC, it's better to use snprintf than sprintf.

> +extern void decode_options (struct gcc_options *, struct gcc_options *,

Please use appropriate headers rather than local declarations of GCC 
functions.

> +// d-bi-attr.h -- D frontend for GCC.

This file looks like it's largely copied from elsewhere in GCC.  In such a 
case, please work out a better way to refactor the code so that it can be 
shared rather than duplicated.  (Again, such common code will no doubt 
need full copyright assignments.)

I don't know whether your assignment "Assigns Past and Future Changes to 
the GNU D Compiler (GDC)" covers changes elsewhere in GCC.  But I expect a 
general assignment for GCC to be needed for any refactoring involved in 
adapting common code for use in D.  (And such refactoring would be a new 
contribution so there shouldn't be any issues with unknown previous 
contributors without assignments - those would only arise if significant 
amounts of previously written D front-end code are being moved into common 
code.)

> +#if D_VA_LIST_TYPE_VOIDPTR

Please avoid #if conditionals on anything that could be a target property.  
It's generally better to use "if" conditionals instead of #if, so that all 
cases are checked for syntax in all compiles.

I see #if conditions on defines such as "V2" and "V1" as well.  Unless 
something is an *existing* target macro or configure macro in GCC, use 
"if" conditions and ensure that the macro is defined to true or false 
values (rather than defined or not defined).  But if a macro is always 
defined, or never defined, then just avoiding the conditionals may be 
better.

The gcc/d/dfrontend/readme.txt says:

> +These sources are free, they are redistributable and modifiable
> +under the terms of the GNU General Public License (attached as gpl.txt),
> +or the Artistic License (attached as artistic.txt).

But that license is GPLv2.  We need an explicit notice (approved by the 
copyright holder) saying that *any later version* may be used.  If Digital 
Mars wishes to license the separately maintained dfrontend/ code under 
GPLv2+ rather than GPLv3+, that's fine, just like the gofrontend/ code is 
under a permissive license - but it needs to be explicit that any later 
version may be used.

I haven't studied the details of the dfrontend/ code.  But if you are to 
follow the Go model - separately maintained code for the front end proper 
that may be used verbatim in multiple compilers, with the code outside 
dfrontend/ doing everything related to interfacing with GCC, and only 
what's related to interfacing with GCC - then the

> 

[PATCH, i386]: Introduce FRNDINT_ROUNDING int iterator

2012-06-19 Thread Uros Bizjak
Hello!

2012-06-19  Uros Bizjak  

* config/i386/i386.md (FRNDINT_ROUNDING): New int iterator.
(rounding): New int attribute.
(ROUNDING): Ditto.
(frndintxf2_): Macroize insn from
frndintxf2_{floor,ceil,trunc} using FRNDINT_ROUNDING int iterator.
(frndintxf2__i387): Macroize insn from
frndintxf2_{floor,ceil,trunc}_i387 using FRNDINT_ROUNDING int iterator.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
Will be committed to mainline SVN.

BTW: A follow-up patch will also macroize fist2_{floor,ceil} and friends.

Uros.
Index: i386.md
===
--- i386.md (revision 188781)
+++ i386.md (working copy)
@@ -15099,11 +15099,26 @@
   DONE;
 })
 
+(define_int_iterator FRNDINT_ROUNDING
+   [UNSPEC_FRNDINT_FLOOR
+UNSPEC_FRNDINT_CEIL
+UNSPEC_FRNDINT_TRUNC])
+
+(define_int_attr rounding
+   [(UNSPEC_FRNDINT_FLOOR "floor")
+(UNSPEC_FRNDINT_CEIL "ceil")
+(UNSPEC_FRNDINT_TRUNC "trunc")])
+
+(define_int_attr ROUNDING
+   [(UNSPEC_FRNDINT_FLOOR "FLOOR")
+(UNSPEC_FRNDINT_CEIL "CEIL")
+(UNSPEC_FRNDINT_TRUNC "TRUNC")])
+
 ;; Rounding mode control word calculation could clobber FLAGS_REG.
-(define_insn_and_split "frndintxf2_floor"
+(define_insn_and_split "frndintxf2_"
   [(set (match_operand:XF 0 "register_operand")
(unspec:XF [(match_operand:XF 1 "register_operand")]
-UNSPEC_FRNDINT_FLOOR))
+  FRNDINT_ROUNDING))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_USE_FANCY_MATH_387
&& flag_unsafe_math_optimizations
@@ -15112,30 +15127,30 @@
   "&& 1"
   [(const_int 0)]
 {
-  ix86_optimize_mode_switching[I387_FLOOR] = 1;
+  ix86_optimize_mode_switching[I387_] = 1;
 
   operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED);
-  operands[3] = assign_386_stack_local (HImode, SLOT_CW_FLOOR);
+  operands[3] = assign_386_stack_local (HImode, SLOT_CW_);
 
-  emit_insn (gen_frndintxf2_floor_i387 (operands[0], operands[1],
-   operands[2], operands[3]));
+  emit_insn (gen_frndintxf2__i387 (operands[0], operands[1],
+operands[2], operands[3]));
   DONE;
 }
   [(set_attr "type" "frndint")
-   (set_attr "i387_cw" "floor")
+   (set_attr "i387_cw" "")
(set_attr "mode" "XF")])
 
-(define_insn "frndintxf2_floor_i387"
+(define_insn "frndintxf2__i387"
   [(set (match_operand:XF 0 "register_operand" "=f")
(unspec:XF [(match_operand:XF 1 "register_operand" "0")]
-UNSPEC_FRNDINT_FLOOR))
+  FRNDINT_ROUNDING))
(use (match_operand:HI 2 "memory_operand" "m"))
(use (match_operand:HI 3 "memory_operand" "m"))]
   "TARGET_USE_FANCY_MATH_387
&& flag_unsafe_math_optimizations"
   "fldcw\t%3\n\tfrndint\n\tfldcw\t%2"
   [(set_attr "type" "frndint")
-   (set_attr "i387_cw" "floor")
+   (set_attr "i387_cw" "")
(set_attr "mode" "XF")])
 
 (define_expand "floorxf2"
@@ -15357,45 +15372,6 @@
   DONE;
 })
 
-;; Rounding mode control word calculation could clobber FLAGS_REG.
-(define_insn_and_split "frndintxf2_ceil"
-  [(set (match_operand:XF 0 "register_operand")
-   (unspec:XF [(match_operand:XF 1 "register_operand")]
-UNSPEC_FRNDINT_CEIL))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_unsafe_math_optimizations
-   && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-{
-  ix86_optimize_mode_switching[I387_CEIL] = 1;
-
-  operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED);
-  operands[3] = assign_386_stack_local (HImode, SLOT_CW_CEIL);
-
-  emit_insn (gen_frndintxf2_ceil_i387 (operands[0], operands[1],
-  operands[2], operands[3]));
-  DONE;
-}
-  [(set_attr "type" "frndint")
-   (set_attr "i387_cw" "ceil")
-   (set_attr "mode" "XF")])
-
-(define_insn "frndintxf2_ceil_i387"
-  [(set (match_operand:XF 0 "register_operand" "=f")
-   (unspec:XF [(match_operand:XF 1 "register_operand" "0")]
-UNSPEC_FRNDINT_CEIL))
-   (use (match_operand:HI 2 "memory_operand" "m"))
-   (use (match_operand:HI 3 "memory_operand" "m"))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_unsafe_math_optimizations"
-  "fldcw\t%3\n\tfrndint\n\tfldcw\t%2"
-  [(set_attr "type" "frndint")
-   (set_attr "i387_cw" "ceil")
-   (set_attr "mode" "XF")])
-
 (define_expand "ceilxf2"
   [(use (match_operand:XF 0 "register_operand"))
(use (match_operand:XF 1 "register_operand"))]
@@ -15613,45 +15589,6 @@
   DONE;
 })
 
-;; Rounding mode control word calculation could clobber FLAGS_REG.
-(define_insn_and_split "frndintxf2_trunc"
-  [(set (match_operand:XF 0 "register_operand")
-   (unspec:XF [(match_operand:XF 1 "register_operand")]
-UNSPEC_FRNDINT_TRUNC))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_unsafe_math_optimizations
-   && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-{
-  

Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

> http://www.gdcproject.org/files/gdc_libphobos.patch.gz

Same comments as before about FSF postal addresses.

Although runtime libraries need not be assigned to the FSF (as per the GCC 
Mission Statement), all significant files should still have copyright and 
license notices (approved by all significant contributors) so that people 
know the free software terms under which they may be used.  E.g., 
libphobos/libdruntime/config/x3.c appears to be missing such notices.  
Without a license (or a dedication to the public domain), a file is 
presumptively copyright and has no license for anyone to use it at all.

> +if true; then

"if true" seems odd; if you have a good reason for it, you need to comment 
it.

> +# generated automatically by aclocal 1.9.6 -*- Autoconf -*-

Please use the standard documented autoconf/automake versions for GCC 
(autoconf 2.64, automake 1.11.1).

> diff -Naur gcc-4.8-20120617/libphobos/autom4te.cache/output.0 
> gcc-4.8/libphobos/autom4te.cache/output.0

We don't check in autom4te.cache directories.

> +# libphobos is usually a symlink to gcc/d/phobos, so libphobos/..

No it's not.  No runtime libraries should go under gcc/ any more at all.

> +dnl Copied from libstdc++-v3/acinclude.m4.  Indeed, multilib will not work

Refactor into the config/ directory, don't copy.

> \ No newline at end of file

Add any missing newlines to text files in all patches.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

> http://www.gdcproject.org/files/gdc_testsuite.patch.gz

I have no comments on this patch for now.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch] Adjustments for Windows x64 SEH

2012-06-19 Thread Richard Henderson
On 2012-06-18 05:22, Tristan Gingold wrote:
> +  /* Win64 SEH, very large frames need a frame-pointer as maximum stack
> + allocation is 4GB (add a safety guard for saved registers).  */
> +  if (TARGET_64BIT_MS_ABI && get_frame_size () + 4096 > SEH_MAX_FRAME_SIZE)
> +return true;

Elsewhere you say this is an upper bound for stack use by the prologue.
It's clearly a wild guess.  The maximum stack use is 10*sse + 8*int 
registers saved, which is a lot less than 4096.

That said, I'm ok with *using* 4096 so long that the comment clearly
states that it's a large over-estimate.  I do suggest, however, folding
this into the SEH_MAX_FRAME_SIZE value, and expanding on the comment
there.  I see no practical difference between 0x8000 and 0x7fffe000
being the limit.

> +/* Output assembly code to get the establisher frame (Windows x64 only).
> +   This corresponds to what will be computed by Windows from Frame Register
> +   and Frame Register Offset fields of the UNWIND_INFO structure.  Since
> +   these values are computed very late (by ix86_expand_prologue), we cannot
> +   express this using only RTL.  */
> +
> +const char *
> +ix86_output_establisher_frame (rtx target)
> +{
> +  if (!frame_pointer_needed)
> +{
> +  /* Note that we have advertized an lea operation.  */
> +  output_asm_insn ("lea{q}\t{0(%%rsp), %0|%0, 0[rsp]}", &target);
> +}
> +  else
> +{
> +  rtx xops[3];
> +  struct ix86_frame frame;
> +
> +  /* Recompute the frame layout here.  */
> +  ix86_compute_frame_layout (&frame);
> +
> +  /* Closely follow how the frame pointer is set in
> +  ix86_expand_prologue.  */
> +  xops[0] = target;
> +  xops[1] = hard_frame_pointer_rtx;
> +  if (frame.hard_frame_pointer_offset == frame.reg_save_offset)
> + xops[2] = GEN_INT (0);
> +  else
> + xops[2] = GEN_INT (-(frame.stack_pointer_offset
> +  - frame.hard_frame_pointer_offset));
> +  output_asm_insn ("lea{q}\t{%a2(%1), %0|%0, %a2[%1]}", xops);

This is what register elimination is for; the value substitution happens
during reload.

Now, one *could* add a new pseudo-hard-register for this (we support as
many register eliminations as needed), but before we do that we need to
decide if we can adjust the soft frame pointer to be the value required.
If so, you can then rely on the existing __builtin_frame_address.  Which
is a very attractive sounding solution.  I'm 99% moving the sfp will work.


r~


Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant

2012-06-19 Thread Jiří Hruška
On Tue, Jun 19, 2012 at 10:54 AM, Richard Guenther
 wrote:
> The issue is that your testcase is invalid.
>    int x = ret(*(&fooS + i));
> this access is only ever valid for i == 0 as otherwise you are creating
> a pointer that points outside of the object fooS.

Richard,

thanks for your reply.

The testcase is invalid also for other reasons, a big one being the
automatic sorting and merging of sections with a dollar sign in their
names is a Windows-originated extension used for PE target only, which
makes it not work elsewhere. Sorry about that, I'll refrain from using
anything non-standard here.


Accessing outside object bounds is IMO a common C practice allowed by
the existence of pointers. This exact technique is used for
decentralized lists created during compile-time, be it extensible
handler/hook structures, pointers to init/fini functions etc. It has
notable use e.g. in Linux kernel [1], [2].

The programmer places defined data to a special linker section in
individual compilation units, then traverse through it using
linker-provided symbols (e.g. ld creates __start_ and
__end_ automatically), as test0.c shows:
  $ gcc -O1 -m32 -fno-toplevel-reorder test0.c && ./a.out
  0: 1
  1: 2
  2: 3

The sole reason for messing with the section attributes is to keep the
values together. Because I can force the order (to the necessary
extent) by -fno-toplevel-reorder, the program can be changed to use
just bounding variables without any linker magic (test1.c):
  $ gcc -O1 -m32 -fno-toplevel-reorder test1.c && ./a.out
  0: 1
  1: 2
  2: 3
The only changes in the code are removing the section attributes and
adding offset by one, skipping the starting element (as __start_foo
has a size now).

Now, changing the end condition from test for the end address to test
for the end sentinel -1 and duplicating the printf() line (to hit the
right optimization spot), something weird happens (test2.c):
  $ gcc -O1 -m32 -fno-toplevel-reorder test2.c && ./a.out
  0: 1
  0: -1
  1: 2
  1: -1
  2: 3
  2: -1
Why is the second line in each iteration different from the first? It
should be printing exactly the same expression.
Analyzing the dom phase log shows the memory access is optimized to
constant value of the base variable, hence -1.
And without optimization, both of them are correct:
  $ gcc -O0 -m32 test2.c && ./a.out
  0: 1
  0: 1
  1: 2
  1: 2
  2: 3
  2: 3

That is the problem I am talking about and which the patch aims to address.

Jiri


[1] 
http://www.compsoc.man.ac.uk/~moz/kernelnewbies/documents/initcall/index.html
[2] http://lkml.indiana.edu/hypermail/linux/kernel/0706.2/2552.html
#include 

__attribute__((section("foo"))) const int foo1 = 1;
__attribute__((section("foo"))) const int foo2 = 2;
__attribute__((section("foo"))) const int foo3 = 3;

extern const int __start_foo, __stop_foo;

int main(void)
{
  int i;

  i = 0;
  do {
printf("%d: %d\n", i, *(&__start_foo + i));
i++;
  } while(&__start_foo + i != &__stop_foo);

  return 0;
}
#include 

const int __start_foo = -1;
const int foo1 = 1;
const int foo2 = 2;
const int foo3 = 3;
const int __stop_foo = -1;

int main(void)
{
  int i;

  i = 0;
  do {
printf("%d: %d\n", i, *(&__start_foo + 1 + i));
i++;
  } while(&__start_foo + 1 + i != &__stop_foo);

  return 0;
}
#include 

const int __start_foo = -1;
const int foo1 = 1;
const int foo2 = 2;
const int foo3 = 3;
const int __stop_foo = -1;

int main(void)
{
  int i;

  i = 0;
  do {
printf("%d: %d\n", i, *(&__start_foo + 1 + i));
printf("%d: %d\n", i, *(&__start_foo + 1 + i));
i++;
  } while(*(&__start_foo + 1 + i) != -1);

  return 0;
}


Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)

2012-06-19 Thread Sterling Augustine
On Wed, Jun 13, 2012 at 10:47 PM, Jason Merrill  wrote:
> On 06/13/2012 04:26 PM, Sterling Augustine wrote:
>>>
>>> I lean toward -g myself, since there doesn't seem to be a strong rule one
>>> way or the other.
>>
>>
>> Unless there are further comments, I'll stick with -g then.
>>
>> I think that covers all the comments, so I think I will commit this
>> Friday morning unless I hear anything further.
>
>
> Weren't you going to repost the patch first?  :)

I hate how codereview.appspot.com doesn't connect some messages properly.

After this prompting, I re-posted the patch here:

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00949.html

As this has addressed all previous comments, and barring any
objections, I'll check it in tomorrow morning.

Sterling


Re: [PATCH 2/3] Add XLP-specific atomic instructions and tweaks.

2012-06-19 Thread Richard Sandiford
Maxim Kuvyrkov  writes:
> The only other change that I made that was not in your comments is the
> addition of "b" mips_print_operand specifier.  The LDADD and SWAP
> instructions accept their address as a plain register without
> parenthesis,

Ouch.

> so I've added the specifier to skip outputting parenthesis.

Yeah, good idea.

Patch is OK, thanks.

Richard


Re: [PATCH, i386]: Introduce FIST_ROUNDING int iterator

2012-06-19 Thread Uros Bizjak
Hello!

2012-06-19  Uros Bizjak  

* config/i386/i386.md (FIST_ROUNDING): New int iterator.
(rounding): Handle UNSPEC_FIST_{FLOOR,CEIL}.
(ROUNDING): Ditto.
(*fist2__1): Macroize insn from
*fist2_{floor,ceil}_1 using FIST_ROUNDING int iterator.
(fistdi2_): Macroize insn from
fistdi2_{floor,ceil} using FIST_ROUNDING int iterator.
(fistdi2__with_temp and splitters): Macroize insn and
corresponding splitters from fistdi2_{floor,ceil} and corresponding
splitters using FIST_ROUNDING int iterator.
(fist2_): Macroize insn from
fist2_{floor,ceil} using FIST_ROUNDING int iterator.
(fist2__with_temp and splitters): Macroize insn and
corresponding splitters from fist2_{floor,ceil} and corresponding
splitters using FIST_ROUNDING int iterator.
(lxf2): Macroize expander from l{floor,ceil}xf2
using FIST_ROUNDING int iterator.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 188783)
+++ config/i386/i386.md (working copy)
@@ -15104,15 +15104,23 @@
 UNSPEC_FRNDINT_CEIL
 UNSPEC_FRNDINT_TRUNC])
 
+(define_int_iterator FIST_ROUNDING
+   [UNSPEC_FIST_FLOOR
+UNSPEC_FIST_CEIL])
+
 (define_int_attr rounding
[(UNSPEC_FRNDINT_FLOOR "floor")
 (UNSPEC_FRNDINT_CEIL "ceil")
-(UNSPEC_FRNDINT_TRUNC "trunc")])
+(UNSPEC_FRNDINT_TRUNC "trunc")
+(UNSPEC_FIST_FLOOR "floor")
+(UNSPEC_FIST_CEIL "ceil")])
 
 (define_int_attr ROUNDING
[(UNSPEC_FRNDINT_FLOOR "FLOOR")
 (UNSPEC_FRNDINT_CEIL "CEIL")
-(UNSPEC_FRNDINT_TRUNC "TRUNC")])
+(UNSPEC_FRNDINT_TRUNC "TRUNC")
+(UNSPEC_FIST_FLOOR "FLOOR")
+(UNSPEC_FIST_CEIL "CEIL")])
 
 ;; Rounding mode control word calculation could clobber FLAGS_REG.
 (define_insn_and_split "frndintxf2_"
@@ -15205,174 +15213,59 @@
   DONE;
 })
 
-(define_insn_and_split "*fist2_floor_1"
-  [(set (match_operand:SWI248x 0 "nonimmediate_operand")
-   (unspec:SWI248x [(match_operand:XF 1 "register_operand")]
-   UNSPEC_FIST_FLOOR))
-   (clobber (reg:CC FLAGS_REG))]
+(define_expand "ceilxf2"
+  [(use (match_operand:XF 0 "register_operand"))
+   (use (match_operand:XF 1 "register_operand"))]
   "TARGET_USE_FANCY_MATH_387
-   && flag_unsafe_math_optimizations
-   && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
+   && flag_unsafe_math_optimizations"
 {
-  ix86_optimize_mode_switching[I387_FLOOR] = 1;
+  if (optimize_insn_for_size_p ())
+FAIL;
+  emit_insn (gen_frndintxf2_ceil (operands[0], operands[1]));
+  DONE;
+})
 
-  operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED);
-  operands[3] = assign_386_stack_local (HImode, SLOT_CW_FLOOR);
-  if (memory_operand (operands[0], VOIDmode))
-emit_insn (gen_fist2_floor (operands[0], operands[1],
- operands[2], operands[3]));
+(define_expand "ceil2"
+  [(use (match_operand:MODEF 0 "register_operand"))
+   (use (match_operand:MODEF 1 "register_operand"))]
+  "(TARGET_USE_FANCY_MATH_387
+&& (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+   || TARGET_MIX_SSE_I387)
+&& flag_unsafe_math_optimizations)
+   || (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
+   && !flag_trapping_math)"
+{
+  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
+  && !flag_trapping_math)
+{
+  if (TARGET_ROUND)
+   emit_insn (gen_sse4_1_round2
+  (operands[0], operands[1], GEN_INT (ROUND_CEIL)));
+  else if (optimize_insn_for_size_p ())
+   FAIL;
+  else if (TARGET_64BIT || (mode != DFmode))
+   ix86_expand_floorceil (operands[0], operands[1], false);
+  else
+   ix86_expand_floorceildf_32 (operands[0], operands[1], false);
+}
   else
 {
-  operands[4] = assign_386_stack_local (mode, SLOT_TEMP);
-  emit_insn (gen_fist2_floor_with_temp (operands[0], operands[1],
- operands[2], operands[3],
- operands[4]));
-}
-  DONE;
-}
-  [(set_attr "type" "fistp")
-   (set_attr "i387_cw" "floor")
-   (set_attr "mode" "")])
+  rtx op0, op1;
 
-(define_insn "fistdi2_floor"
-  [(set (match_operand:DI 0 "memory_operand" "=m")
-   (unspec:DI [(match_operand:XF 1 "register_operand" "f")]
-  UNSPEC_FIST_FLOOR))
-   (use (match_operand:HI 2 "memory_operand" "m"))
-   (use (match_operand:HI 3 "memory_operand" "m"))
-   (clobber (match_scratch:XF 4 "=&1f"))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_unsafe_math_optimizations"
-  "* return output_fix_trunc (insn, operands, false);"
-  [(set_attr "type" "fistp")
-   (set_attr "i387_cw" "floor")
-   (set_attr "mode" "DI")])
+  if (optimize_insn_for_size_p ())
+   FAIL

Re: [PATCH 2/3] Use synth_mult for vector multiplies vs scalar constant

2012-06-19 Thread Richard Henderson
On 2012-06-16 04:19, Eric Botcazou wrote:
>> @@ -179,7 +179,11 @@ extern const unsigned char
>> mode_class[NUM_MACHINE_MODES];
>>
>>  extern CONST_MODE_SIZE unsigned char mode_size[NUM_MACHINE_MODES];
>>  #define GET_MODE_SIZE(MODE)((unsigned short) mode_size[MODE])
>> -#define GET_MODE_BITSIZE(MODE) ((unsigned short) (GET_MODE_SIZE (MODE) *
>> BITS_PER_UNIT)) +
>> +#define GET_MODE_BITSIZE(MODE) \
>> +  ((unsigned short) (GET_MODE_SIZE (MODE) * BITS_PER_UNIT))
>> +#define GET_MODE_UNIT_BITSIZE(MODE) \
>> +  ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT))
>>
>>  /* Get the number of value bits of an object of mode MODE.  */
>>  extern const unsigned short mode_precision[NUM_MACHINE_MODES];
> 
> Can you move GET_MODE_UNIT_BITSIZE to after GET_MODE_UNIT_SIZE, changing 
> "size 
> in bytes" to "size in bytes and bits" in the comment just above?  Because the 
> overloading of UNIT in the macro makes the whole thing slightly confusing. :-)
> 

Done in the committed patch.


r~



Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant

2012-06-19 Thread Andreas Schwab
Jiří Hruška  writes:

> #include 
>
> __attribute__((section("foo"))) const int foo1 = 1;
> __attribute__((section("foo"))) const int foo2 = 2;
> __attribute__((section("foo"))) const int foo3 = 3;
>
> extern const int __start_foo, __stop_foo;

Declare them as arrays.

extern const int __start_foo[], __stop_foo[];

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH] backport darwin12 fixes to gcc-4_7-branch

2012-06-19 Thread Mike Stump
Ok.


Re: [RFC 0/3] Stuff related to pr53533

2012-06-19 Thread Richard Henderson
On 2012-06-15 13:57, Richard Henderson wrote:
> Bootstrapped and tested on x86_64, but I'll leave some time for
> comment before committing any of this.

Patches now committed.


r~


Re: [PATCH] Improve pattern recognizer for division by constant (PR tree-optimization/51581)

2012-06-19 Thread Richard Henderson
On 2012-06-18 22:46, Jakub Jelinek wrote:
> On Mon, Jun 18, 2012 at 04:44:21PM -0700, Richard Henderson wrote:
>> On 2012-06-14 13:58, Jakub Jelinek wrote:
>>> +  if (!supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt,
>>> +  vecwtype, vectype,
>>> +  &dummy, &dummy, &dummy_code,
>>> +  &dummy_code, &dummy_int, &dummy_vec))
>>> +return NULL;
>>
>>
>> It would be nice to be able to handle high-part multiplies as well, e.g. 
>> VEC_WIDEN_MULT_HI_EXPR.  Which is what Altivec provides, and not 
>> VEC_WIDEN_MULT.
> 
> Sure, but we don't have a tree code for that right now, do we?
> VEC_WIDEN_MULT_HI_EXPR is just one half of the widened multiply results,
> not all the high halves of the widened multiply.

Actually, it is all the high parts of the multiply results.  The comment
in tree.def is incorrect.  Likewise MULT_LO_EXPR is the low parts (and
fully redundant with plain MULT_EXPR, really).

> For 16-bit multiplication we could also use {,V}PMULH{,U}W
> (for 32-bit multiplication we use two {,V}PMUL{,U}DQ plus shifts afterwards).

Well, an single interleave, not shifts, but yes.


r~




Re: [PATCH] Fix PR53708

2012-06-19 Thread Iain Sandoe

On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote:

> On Tue, 19 Jun 2012, Richard Guenther wrote:
>> 
>>> Richard Guenther  writes:
 We are too eager to bump alignment of some decls when vectorizing.
 The fix is to not bump alignment of decls the user explicitely
 aligned or that are used in an unknown way.
>>> 
>>> I thought attribute((__aligned__)) only set a minimum alignment for
>>> variables?  Most usees I've seen have been trying to get better
>>> performance from higher alignment, so it might not go down well if the
>>> attribute stopped the vectoriser from increasing the alignment still
>>> further.
>> 
>> That's what the documentation says indeed.  I'm not sure which part of
>> the patch fixes the ObjC failures where the alignment is part of the ABI
>> (and I suppose ObjC then mis-uses the aligned attribute?).
> 
> A quick test shows that 
> 
> if (DECL_PRESERVE_P (decl))
> 
> alone is enough to fix the objc failures, while they are still there if 
> one uses only
> 
> if (DECL_USER_ALIGN (decl))

That makes sense, I had a quick look at the ObjC code, and it appears that the 
explicit ALIGNs were never committed to trunk.

Thus, the question becomes; what should ObjC (or any other) FE do to ensure 
that specific ABI (upper) alignment constraints are met?

Iain



Re: [patch] Deal with #ident without

2012-06-19 Thread Steven Bosscher
On Thu, Jun 7, 2012 at 11:22 AM, Richard Guenther
 wrote:
> On Thu, Jun 7, 2012 at 8:16 AM, Andreas Schwab  wrote:
>> Steven Bosscher  writes:
>>
>>> Index: doc/tm.texi
>>> ===
>>> --- doc/tm.texi       (revision 188182)
>>> +++ doc/tm.texi       (working copy)
>>> @@ -5847,6 +5847,10 @@ value is 0.
>>>  @end deftypevr
>>>
>>>  @deftypefn {Target Hook} void TARGET_ASM_OUTPUT_ANCHOR (rtx @var{x})
>>> +
>>> +@deftypefn {Target Hook} void TARGET_ASM_OUTPUT_IDENT (const char 
>>> *@var{name})
>>> +Generate a string based on @var{name}, suitable for the @samp{#ident}  
>>> directive, or the equivalent directive or pragma in non-C-family languages. 
>>>  If this hook is not defined, nothing is output for the @samp{#ident}  
>>> directive.
>>> +@end deftypefn
>>
>> That looks misplaced.
>
> Ok after double-checking the above.

I've now committed this, see r188791.

Ciao!
Steven


Re: [patch] Fix PR48109 using artificial top-level asm statements (darwin/objc)

2012-06-19 Thread Mike Stump
On Jun 18, 2012, at 10:51 AM, Steven Bosscher  wrote:
> This patch started as an attempt to remove #include "output.h" from
> objc/: Instead of writing references directly to asm_out_file, the
> references are output as top-level asm statements.

>  OK for trunk?

Ok.


Re: [patch] Use IDENTIFIER_LENGTH instead of strlen(IDENTIFIER_POINTER) in a few places

2012-06-19 Thread Mike Stump
On Jun 18, 2012, at 8:55 AM, Steven Bosscher  wrote:
> Obvious enough
> 
> objc/
>* objc-encoding.c (encode_aggregate_fields): Use IDENTIFIER_LENGTH
>instead of strlen(IDENTIFIER_POINTER).
>(encode_aggregate_within): Likewise.

Ok.


Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant

2012-06-19 Thread Jiří Hruška
On Tue, Jun 19, 2012 at 8:59 PM, Andreas Schwab  wrote:
> Declare them as arrays.
> extern const int __start_foo[], __stop_foo[];
Thanks, that's a good suggestion, cleans the code nicely!
(Though, of course, both ways work here and the strange things happen
only in the 3rd testcase, which does not use these special variables.)


Re: [PATCH] Fix PR tree-optimization/53636 (SLP generates invalid misaligned access)

2012-06-19 Thread Mikael Pettersson
Richard Guenther writes:
 > On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand  wrote:
 > > Richard Guenther wrote:
 > >> On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand  
 > >> wrote:
 > >> > However, there is a second case where we need to check every pass: if
 > >> > we're not actually vectorizing any loop, but are performing basic-block
 > >> > SLP.  In this case, it would appear that we need the same check as
 > >> > described in the comment above, i.e. to verify that the stride is a
 > >> > multiple of the vector size.
 > >> >
 > >> > The patch below adds this check, and this indeed fixes the invalid 
 > >> > access
 > >> > I was seeing in the test case (in the final assembler, we now get a
 > >> > vld1.16 instead of vldr).
 > >> >
 > >> > Tested on arm-linux-gnueabi with no regressions.
 > >> >
 > >> > OK for mainline?
 > >>
 > >> Ok.
 > >
 > > Thanks for the quick review; I've checked this in to mainline now.
 > >
 > > I just noticed that the test case also crashes on 4.7, but not on 4.6.
 > >
 > > Would a backport to 4.7 also be OK, once testing passes?
 > 
 > Yes.  Please leave it on mainline a few days to catch fallout from
 > autotesters.

This patch caused

FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp "basic block vectorized 
using SLP" 1

on sparc64-linux.  Comparing the pre and post patch dumps for that file shows

 22: vect_compute_data_ref_alignment:
 22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B]
 22: vect_compute_data_ref_alignment:
-22: force alignment of arr[i_87]
-22: misalign = 0 bytes of ref arr[i_87]
+22: SLP: step doesn't divide the vector-size.
+22: Unknown alignment for access: arr

(lots of stuff that's simply gone)

-22: BASIC BLOCK VECTORIZED
-
-22: basic block vectorized using SLP
+22: not vectorized: unsupported unaligned store.arr[i_87]
+22: not vectorized: unsupported alignment in basic block.

/Mikael


Re: [patch] Fix failing nested-3.C on ARM.

2012-06-19 Thread Mike Stump
On Jun 19, 2012, at 2:18 AM, Richard Earnshaw  wrote:
> The regexp in nested-3.C has to parse the machine-specific comment
> character; on ARM that is '@'.
> 
> Tested on arm-eabi, where this test now passes.
> 
> OK?

Ok.


Re: [PATCH] Fix PR53708

2012-06-19 Thread Mike Stump
On Jun 19, 2012, at 12:22 PM, Iain Sandoe  wrote:
> On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote:
> 
>> On Tue, 19 Jun 2012, Richard Guenther wrote:
>>> 
 Richard Guenther  writes:
> We are too eager to bump alignment of some decls when vectorizing.
> The fix is to not bump alignment of decls the user explicitely
> aligned or that are used in an unknown way.
 
 I thought attribute((__aligned__)) only set a minimum alignment for
 variables?  Most usees I've seen have been trying to get better
 performance from higher alignment, so it might not go down well if the
 attribute stopped the vectoriser from increasing the alignment still
 further.
>>> 
>>> That's what the documentation says indeed.  I'm not sure which part of
>>> the patch fixes the ObjC failures where the alignment is part of the ABI
>>> (and I suppose ObjC then mis-uses the aligned attribute?).
>> 
>> A quick test shows that 
>> 
>> if (DECL_PRESERVE_P (decl))
>> 
>> alone is enough to fix the objc failures, while they are still there if 
>> one uses only
>> 
>> if (DECL_USER_ALIGN (decl))
> 
> That makes sense, I had a quick look at the ObjC code, and it appears that 
> the explicit ALIGNs were never committed to trunk.
> 
> Thus, the question becomes; what should ObjC (or any other) FE do to ensure 
> that specific ABI (upper) alignment constraints are met?

Hum, upper is easy...  I thought the issue was that extra alignment would kill 
it?  I know that extra alignment does kill some of the objc metadata.


Re: [PATCH] Fix PR53708

2012-06-19 Thread Mike Stump
On Jun 19, 2012, at 5:53 AM, domi...@lps.ens.fr (Dominique Dhumieres) wrote:
> On Tue, 19 Jun 2012, Richard Guenther wrote:
>> 
>>> Richard Guenther  writes:
 We are too eager to bump alignment of some decls when vectorizing.
 The fix is to not bump alignment of decls the user explicitely
 aligned or that are used in an unknown way.
>>> 
>>> I thought attribute((__aligned__)) only set a minimum alignment for
>>> variables?  Most usees I've seen have been trying to get better
>>> performance from higher alignment, so it might not go down well if the
>>> attribute stopped the vectoriser from increasing the alignment still
>>> further.
>> 
>> That's what the documentation says indeed.  I'm not sure which part of
>> the patch fixes the ObjC failures where the alignment is part of the ABI
>> (and I suppose ObjC then mis-uses the aligned attribute?).
> 
> A quick test shows that 
> 
> if (DECL_PRESERVE_P (decl))
> 
> alone is enough to fix the objc failures,

Sounds good to me.  It seems ok to me for the optimizer bumps up the alignment 
on things that aren't special.  DECL_PRESERVE seems like a reasonable way to 
declare they are special.


Re: [testsuite] profopt.exp and friends: use expected list of options

2012-06-19 Thread Mike Stump
On Jun 18, 2012, at 4:51 PM, Janis Johnson  wrote:
> There are tests in g++.tree-prof that have non-unique lines in test
> summaries for scan-*-dump checks.  Investigation showed that these tests
> were being run multiple times, for a list of options that had leaked
> over from another set of profile-directed optimization tests.

> This patch makes
> it use [ { -O2 } {-O3 } ] so the options tested there will get some
> coverage with optimization, although not as much as originally planned
> when the tests were added years and years ago.

Sounds ok to me, but I'd be happy to have a prof champion chime in, if they 
disagree.

> OK for mainline?

Ok, with the caveat that I'll defer to a prof champion.


Re: [PATCH] Fix PR53708

2012-06-19 Thread Iain Sandoe

On 19 Jun 2012, at 22:41, Mike Stump wrote:

> On Jun 19, 2012, at 12:22 PM, Iain Sandoe  wrote:
>> On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote:
>> 
>>> On Tue, 19 Jun 2012, Richard Guenther wrote:
 
> Richard Guenther  writes:
>> We are too eager to bump alignment of some decls when vectorizing.
>> The fix is to not bump alignment of decls the user explicitely
>> aligned or that are used in an unknown way.
> 
> I thought attribute((__aligned__)) only set a minimum alignment for
> variables?  Most usees I've seen have been trying to get better
> performance from higher alignment, so it might not go down well if the
> attribute stopped the vectoriser from increasing the alignment still
> further.
 
 That's what the documentation says indeed.  I'm not sure which part of
 the patch fixes the ObjC failures where the alignment is part of the ABI
 (and I suppose ObjC then mis-uses the aligned attribute?).
>>> 
>>> A quick test shows that 
>>> 
>>> if (DECL_PRESERVE_P (decl))
>>> 
>>> alone is enough to fix the objc failures, while they are still there if 
>>> one uses only
>>> 
>>> if (DECL_USER_ALIGN (decl))
>> 
>> That makes sense, I had a quick look at the ObjC code, and it appears that 
>> the explicit ALIGNs were never committed to trunk.
>> 
>> Thus, the question becomes; what should ObjC (or any other) FE do to ensure 
>> that specific ABI (upper) alignment constraints are met?
> 
> Hum, upper is easy...  I thought the issue was that extra alignment would 
> kill it?  I know that extra alignment does kill some of the objc metadata.

clearly, ambiguous phrasing on my part. 
I mean when we want to say "no more than this much".





Fix e500 vector ICE with string constants

2012-06-19 Thread Joseph S. Myers
On some tests involving storing a pointer to a string constant in a
vector, on powerpc with SPE vectors, an ICE occurs of the form:

t2.c: In function 'f':
t2.c:7:1: error: unrecognizable insn:
 }
 ^
(insn 9 8 10 2 (set (subreg:SI (reg:V2SI 125 [ D.1618 ]) 4)
(lo_sum:SI (reg:SI 126)
(symbol_ref/f:SI ("*.LC0") [flags 0x82] ))) t2.c:6 -1
 (nil))
t2.c:7:1: internal compiler error: in extract_insn, at recog.c:2130
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

The patterns to set individual words of SPE vectors only allow
input_operand and do not allow for the LO_SUM constructs used for
pointers to strings.  This patch fixes things by adding further
patterns for the LO_SUM case.  (It's possible the issue could also
arise with the patterns for subregs of TFmode at offset 8 and 12, but
I couldn't get the compiler to generate stores of string constant
pointers to such subregs.)

The original test I had for this issue in a 4.6-based compiler
simplified to

char *a1[20];
int a2[20];
char a3[1];

void
f (void)
{
  int i;
  for (i = 1; i < 20; i++)
{
  a1[i] = "";
  a2[i] = 0;
}
}

with -O3, where the vectors were generated internally, but that
doesn't ICE with trunk, so I created the synthetic testcases in this
patch that do ICE with trunk.

Tested with no regressions with cross to powerpc-eabispe.  OK to
commit?

2012-06-19  Joseph Myers  

* config/rs6000/spe.md (*mov_si_e500_subreg0): Rename to
mov_si_e500_subreg0.
(*mov_si_e500_subreg0_elf_low)
(*mov_si_e500_subreg4_elf_low): New patterns.

testsuite:
2012-06-19  Joseph Myers  

* gcc.c-torture/compile/vector-5.c,
gcc.c-torture/compile/vector-6.c: New tests.

Index: gcc/testsuite/gcc.c-torture/compile/vector-5.c
===
--- gcc/testsuite/gcc.c-torture/compile/vector-5.c  (revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/vector-5.c  (revision 0)
@@ -0,0 +1,7 @@
+typedef int v2si __attribute__((__vector_size__(8)));
+
+v2si
+f (int x)
+{
+  return (v2si) { x, (__INTPTR_TYPE__) "" };
+}
Index: gcc/testsuite/gcc.c-torture/compile/vector-6.c
===
--- gcc/testsuite/gcc.c-torture/compile/vector-6.c  (revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/vector-6.c  (revision 0)
@@ -0,0 +1,7 @@
+typedef int v2si __attribute__((__vector_size__(8)));
+
+v2si
+f (int x)
+{
+  return (v2si) { (__INTPTR_TYPE__) "", x };
+}
Index: gcc/config/rs6000/spe.md
===
--- gcc/config/rs6000/spe.md(revision 188753)
+++ gcc/config/rs6000/spe.md(working copy)
@@ -1,5 +1,5 @@
 ;; e500 SPE description
-;; Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009
+;; Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2011, 2012
 ;; Free Software Foundation, Inc.
 ;; Contributed by Aldy Hernandez (a...@quesejoda.com)
 
@@ -2329,7 +2329,7 @@
   "evmergehi %0,%1,%1\;mr %L0,%1\;evmergehi %Y0,%L1,%L1\;mr %Z0,%L1"
   [(set_attr "length" "16")])
 
-(define_insn "*mov_si_e500_subreg0"
+(define_insn "mov_si_e500_subreg0"
   [(set (subreg:SI (match_operand:SPE64TF 0 "register_operand" "+r,&r") 0)
(match_operand:SI 1 "input_operand" "r,m"))]
   "(TARGET_E500_DOUBLE && (mode == DFmode || mode == TFmode))
@@ -2339,6 +2339,24 @@
evmergelohi %0,%0,%0\;{l%U1%X1|lwz%U1%X1} %0,%1\;evmergelohi %0,%0,%0"
   [(set_attr "length" "4,12")])
 
+(define_insn_and_split "*mov_si_e500_subreg0_elf_low"
+  [(set (subreg:SI (match_operand:SPE64TF 0 "register_operand" "+r") 0)
+   (lo_sum:SI (match_operand:SI 1 "gpc_reg_operand" "r")
+  (match_operand 2 "" "")))]
+  "((TARGET_E500_DOUBLE && (mode == DFmode || mode == TFmode))
+|| (TARGET_SPE && mode != DFmode && mode != TFmode))
+   && TARGET_ELF && !TARGET_64BIT && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_elf_low (tmp, operands[1], operands[2]));
+  emit_insn (gen_mov_si_e500_subreg0 (operands[0], tmp));
+  DONE;
+}
+  [(set_attr "length" "8")])
+
 ;; ??? Could use evstwwe for memory stores in some cases, depending on
 ;; the offset.
 (define_insn "*mov_si_e500_subreg0_2"
@@ -2360,6 +2378,15 @@
mr %0,%1
{l%U1%X1|lwz%U1%X1} %0,%1")
 
+(define_insn "*mov_si_e500_subreg4_elf_low"
+  [(set (subreg:SI (match_operand:SPE64TF 0 "register_operand" "+r") 4)
+   (lo_sum:SI (match_operand:SI 1 "gpc_reg_operand" "r")
+  (match_operand 2 "" "")))]
+  "((TARGET_E500_DOUBLE && (mode == DFmode || mode == TFmode))
+|| (TARGET_SPE && mode != DFmode && mode != TFmode))
+   && TARGET_ELF && !TARGET_64BIT"
+  "{ai|addic} %0,%1,%K2")
+
 (define_insn "*mov_si_e500_subreg4_2"
   [(set (match_operand:SI 0 "rs6000_nonimmediate_operand" "+r,m")
(subreg:SI (match_operand:SP

[patch][PCH] Do not write/read asm_out_file, take 2

2012-06-19 Thread Steven Bosscher
Hello,

The attached patch removes one more #include output.h, this time from
c-family/c-pch.c.

Anything written out to asm_out_file between pch_init and
c_common_write_pch is read back in by c_common_write_pch and dumped to
the PCH that's being written out. In c_common_read_pch this data is
written out verbatim to asm_out_file again.

But nothing should write to asm_out_file between pch_init and
c_common_write_pch. I suppose this happened before unit-at-a-time
became the only supported compilation mode, but these days there's
nothing, AFAICT, that should be written to asm_out_file by a front end
during PCH generation.

This patch was bootstrapped&tested on powerpc64-unknown-linux-gnu.
The issues with #ident have already been addressed, and this patch
adds a new test case, to make sure...

OK for trunk?

Ciao!
Steven


01_c_pch_no_asm_out_file.diff
Description: Binary data


Re: Fix e500 vector ICE with string constants

2012-06-19 Thread David Edelsohn
On Tue, Jun 19, 2012 at 5:56 PM, Joseph S. Myers
 wrote:

> 2012-06-19  Joseph Myers  
>
>        * config/rs6000/spe.md (*mov_si_e500_subreg0): Rename to
>        mov_si_e500_subreg0.
>        (*mov_si_e500_subreg0_elf_low)
>        (*mov_si_e500_subreg4_elf_low): New patterns.
>
> testsuite:
> 2012-06-19  Joseph Myers  
>
>        * gcc.c-torture/compile/vector-5.c,
>        gcc.c-torture/compile/vector-6.c: New tests.

Okay.

Thanks, David


[patch committed testsuite] Tweak gcc.dg/stack-usage-1.c on SH

2012-06-19 Thread Kaz Kojima
Hi,

I've applied the attached patch which is a tiny SH specific
change of gcc.dg/stack-usage-1.c test.  Tested on sh-linux
and i686-pc-linux-gnu.

Regards,
kaz
--
2012-06-19  Kaz Kojima  

* gcc.dg/stack-usage-1.c: Use sh*-*-* instead of sh-*-*.

--- ORIG/trunk/gcc/testsuite/gcc.dg/stack-usage-1.c 2012-06-16 
09:29:54.0 +0900
+++ trunk/gcc/testsuite/gcc.dg/stack-usage-1.c  2012-06-19 07:55:54.0 
+0900
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-fstack-usage" } */
-/* { dg-options "-fstack-usage -fomit-frame-pointer" { target { sh-*-* } } } */
+/* { dg-options "-fstack-usage -fomit-frame-pointer" { target { sh*-*-* } } } 
*/
 
 /* This is aimed at testing basic support for -fstack-usage in the back-ends.
See the SPARC back-end for example (grep flag_stack_usage_info in sparc.c).


[patch][ARM] Do not include output.h in arm-c.c

2012-06-19 Thread Steven Bosscher
Hello,

Only a few front-end files to go that need output.h, and some of them
are in the c_target_objs: arm, mep, m32c, and rl78.

This patch tackles the ARM case.  arm-c.c needs output.h because
EMIT_EABI_ATTRIBUTE wants to print to asm_out_file. Solved by
replacing EMIT_EABI_ATTRIBUTE with a function
arm.c:arm_emit_eabi_attribute.

Tested by building a cross-compiler from powerpc64-unknown-linux-gnu X
arm-eabi, and comparing assembly on a set of files.
OK for trunk?

Ciao!
Steven


arm_C_no_output_h.diff
Description: Binary data


Re: [RFC 0/3] Stuff related to pr53533

2012-06-19 Thread Matt

On 2012-06-15 13:57, Richard Henderson wrote:

> Bootstrapped and tested on x86_64, but I'll leave some time for
> comment before committing any of this.



Patches now committed.


Hey Richard,

Thanks for taking on some of these issues. I'm not seeing much of an 
improvement yet when manually applying the patches to 4.7, but it looks 
like steps in the right direction. Having to turn off vectorization to 
approximate previous compiler performance was disappointing given it's 
supposed to give us a boost on some of these architectures ;)


Would it be possible to commit these to 4_7-branch as well? (One of the 
patches looks relevant to 4.6 as well, and applied cleanly, but I haven't 
tested to see if it had a noticeable effect.)


Thanks again!


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


[cxx-conversion] Remove option to build without a C++ compiler (issue6296093)

2012-06-19 Thread Diego Novillo
Remove option to build without a C++ compiler.

This patch removes all the configuration code that allowed GCC to
build without a C++ compiler.  After this patch the following
configuration flags are no longer valid:

--enable-build-with-cxx
--enable-build-poststage1-with-cxx

All builds will unconditionally use C++.

Tested on x86_64.

Ian, could you please take a look to double check I have not missed
anything?  There was more code dealing with it than I was expecting.

I'm also not sure how to propagate the changes in go/gofrontend, but
we don't need to worry about that until we do the acutal merge into
trunk.

Thanks.  Diego.


2012-06-19   Diego Novillo  

ChangeLog.cxx-conversion
* Makefile.tpl (STAGE[+id+]_CXXFLAGS): Remove
POSTSTAGE1_CONFIGURE_FLAGS.
* Makefile.in: Regenerate.
* configure.ac (ENABLE_BUILD_WITH_CXX): Remove.  Update all users.
* configure: Regenerate.

gcc/ChangeLog.cxx-conversion
* Makefile.in: Remove all handlers of ENABLE_BUILD_WITH_CXX.
* configure.ac: Likewise.
* configure: Regenerate.
* config.in: Regenerate.
* doc/install.texi: Remove documentation for --enable-build-with-cxx
and --enable-build-poststage1-with-cxx.

gcc/go/ChangeLog.cxx-conversion
* go-c.h: Remove all handlers of ENABLE_BUILD_WITH_CXX.
* go-gcc.cc: Likewise.
* go-system.h: Likewise.

libcpp/ChangeLog.cxx-conversion
* Makefile.in: Remove all handlers of ENABLE_BUILD_WITH_CXX.
* configure.ac: Likewise.
* configure: Regenerate.

diff --git a/Makefile.in b/Makefile.in
index def860e..d81fb97 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -422,7 +422,6 @@ TFLAGS =
 STAGE_CFLAGS = $(BOOT_CFLAGS)
 STAGE_TFLAGS = $(TFLAGS)
 STAGE_CONFIGURE_FLAGS=@stage2_werror_flag@
-POSTSTAGE1_CONFIGURE_FLAGS = @POSTSTAGE1_CONFIGURE_FLAGS@
 
 
 # Defaults for stage 1; some are overridden below.
@@ -433,10 +432,7 @@ STAGE1_CXXFLAGS = $(CXXFLAGS)
 STAGE1_CXXFLAGS = $(STAGE1_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGE1_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGE1_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGE1_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage 2; some are overridden below.
 STAGE2_CFLAGS = $(STAGE_CFLAGS)
@@ -446,10 +442,7 @@ STAGE2_CXXFLAGS = $(CXXFLAGS)
 STAGE2_CXXFLAGS = $(STAGE2_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGE2_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGE2_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGE2_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage 3; some are overridden below.
 STAGE3_CFLAGS = $(STAGE_CFLAGS)
@@ -459,10 +452,7 @@ STAGE3_CXXFLAGS = $(CXXFLAGS)
 STAGE3_CXXFLAGS = $(STAGE3_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGE3_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGE3_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGE3_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage 4; some are overridden below.
 STAGE4_CFLAGS = $(STAGE_CFLAGS)
@@ -472,10 +462,7 @@ STAGE4_CXXFLAGS = $(CXXFLAGS)
 STAGE4_CXXFLAGS = $(STAGE4_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGE4_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGE4_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGE4_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage profile; some are overridden below.
 STAGEprofile_CFLAGS = $(STAGE_CFLAGS)
@@ -485,10 +472,7 @@ STAGEprofile_CXXFLAGS = $(CXXFLAGS)
 STAGEprofile_CXXFLAGS = $(STAGEprofile_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGEprofile_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGEprofile_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGEprofile_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage feedback; some are overridden below.
 STAGEfeedback_CFLAGS = $(STAGE_CFLAGS)
@@ -498,10 +482,7 @@ STAGEfeedback_CXXFLAGS = $(CXXFLAGS)
 STAGEfeedback_CXXFLAGS = $(STAGEfeedback_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGEfeedback_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGEfeedback_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGEfeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 
 # Only build the C compiler for stage1, because that is the only one that
@@ -519,9 +500,6 @@ STAGE1_LANGUAGES = @stag

Re: [patch] Deal with #ident without

2012-06-19 Thread Hans-Peter Nilsson
On Tue, 19 Jun 2012, Steven Bosscher wrote:
> I've now committed this, see r188791.

Breaking cris-elf.  Just try rebuilding cc1:
./gcc/gcc/../libdecnumber/dpd -I../libdecnumber\
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c -o cris.o
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 
'cris_asm_output_ident':
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: 'cgraph_state' 
undeclared (first use in this function)
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: (Each undeclared 
identifier is reported only once
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: for each function 
it appears in.)
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: 
'CGRAPH_STATE_PARSING' undeclared (first use in this funct\
ion)
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2478: warning: unused variable 
'buf'
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2477: warning: unused variable 
'size'
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2476: warning: unused variable 
'section_asm_op'
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 
'cris_option_override':
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2538: error: 
'flag_no_gcc_ident' undeclared (first use in this function\
)
make[2]: *** [cris.o] Error 1

brgds, H-P


Re: [RFC 0/3] Stuff related to pr53533

2012-06-19 Thread Richard Henderson
On 2012-06-19 15:55, Matt wrote:
> On 2012-06-15 13:57, Richard Henderson wrote:
>> > Bootstrapped and tested on x86_64, but I'll leave some time for
>> > comment before committing any of this.
> 
>> Patches now committed.
> 
> Hey Richard,
> 
> Thanks for taking on some of these issues. I'm not seeing much of an
> improvement yet when manually applying the patches to 4.7...

Of course not.  None of them address the real problem.  They merely
fix warts discovered along the way.

> Would it be possible to commit these to 4_7-branch as well?

No, I don't think so.


r~


Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)

2012-06-19 Thread Jason Merrill

On 06/19/2012 10:12 AM, Sterling Augustine wrote:

+ /* If we're putting types in their own .debug_types sections,
+the .debug_pubtypes table will still point to the compile
+unit (not the type unit), so we want to use the offset of
+the skeleton DIE (if there is one).  */
+ if (pub->die->comdat_type_p && names == pubtype_table)
+   {
+ comdat_type_node_ref type_node = pub->die->die_id.die_type_node;
+
+ if (type_node != NULL && type_node->skeleton_die != NULL)
+   die_offset = type_node->skeleton_die->die_offset;
+   }


I think we had agreed that if there is no skeleton, we should use an 
offset of 0.


Jason


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-06-19 Thread Sriraman Tallam
Ping.

On Thu, Jun 14, 2012 at 1:13 PM, Sriraman Tallam  wrote:
> +cc c++ front-end maintainers
>
> Hi,
>
>   C++ Frontend maintainers, Could you please take a look at the
> front-end part when you find the time?
>
>   Honza, your thoughts on the callgraph part?
>
>   Richard, any further comments/feedback?
>
>   Additionally, I am working on generating better mangled names for
> function versions, along the lines of C++ thunks.
>
> Thanks,
> -Sri.
>
> On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam  wrote:
>> Hi,
>>
>>   Attaching updated patch for function multiversioning which brings
>> in plenty of changes.
>>
>> * As suggested by Richard earlier, I have made cgraph aware of
>> function versions. All nodes of function versions are chained and the
>> dispatcher bodies are created on demand while building cgraph edges.
>> The dispatcher body will be created if and only if there is a call or
>> reference to a versioned function. Previously, I was maintaining the
>> list of versions separately in a hash map, all that is gone now.
>> * Now, the file multiverison.c has some helper routines that are used
>> in the context of function versioning. There are no new passes and no
>> new globals.
>> * More tests, updated existing tests.
>> * Fixed lots of bugs.
>> * Updated patch description.
>>
>> Patch attached. Patch also available for review at
>> http://codereview.appspot.com/5752064
>>
>> Please let me know what you think,
>>
>> Thanks,
>> -Sri.
>>
>>
>> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam  
>> wrote:
>>> Hi H.J,
>>>
>>>   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
>>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is
>>> not needed as they are mutually exclusive, any order should be fine.
>>>
>>> Patch also available for review here:  http://codereview.appspot.com/5752064
>>>
>>> Thanks,
>>> -Sri.
>>>
>>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu  wrote:
 On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam  
 wrote:
> Hi H.J.,
>
>   I have updated the patch to improve the dispatching method like we
> discussed. Each feature gets a priority now, and the dispatching is
> done in priority order. Please see i386.c for the changes.
>
> Patch also available for review here:  
> http://codereview.appspot.com/5752064
>

 I think you need 3 tests:

 1.  Only with ISA.
 2.  Only with arch
 3.  Mixed with ISA and arch

 since test mixed ISA and arch may hide issues with ISA only or arch only.

 --
 H.J.


Re: [PATCH, MIPS] Add most common atomic patterns

2012-06-19 Thread Maxim Kuvyrkov
I've now checked these patches.

Tom, thanks for great optimizing sync and atomic builtins for MIPS and XLP, 
and, Richard, thanks for the reviews and education on writing good .md 
descriptions.

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



On 13/06/2012, at 5:50 PM, Maxim Kuvyrkov wrote:

> This patch series adds necessary patterns for __atomic_compare_exchange[_n], 
> __atomic_exchange[_n] and __atomic_fetch_add builtins.  These are the 
> builtins that correspond to inline assembly that MIPS GLIBC port is using.
> 
> The patches were originally developed by Tom de Vries a while ago, and I've 
> rewrote parts of them to be better suited for upstream.
> 
> The second patch adds XLP-specific patterns to support its swap and ldadd 
> instructions.  Unfortunately, there seem to be a problem in reload that 
> prevents reload from properly spilling address for these two patterns.  I 
> will work with reload experts on investigating and fixing this problem, but, 
> meanwhile, the patch contains a workaround that avoids the problem.
> 
> The third patch is a small optimization to alleviate 
> __atomic_compare_exchange[_n] builtins being a use-one-for-all solutions.  
> These builtins return both boolean "success" and "oldval" results.  As most 
> cases use only one of the results, this optimizations looks at REG_UNUSED 
> notes to determine if instructions to set these results can be omitted.
> 
> The patch series was tested by running GLIBC testsuite for n32, n64 and o32 
> ABIs on XLP and [in-progress] non-XLP MIPS boards with no regressions with a 
> corresponding patch to MIPS GLIBC port to use the new atomic builtins.
> 
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
> 
> 
> 



Re: [PATCH] Unify emit_{pre,post}_atomic_barrier across Alpha, ARM, MIPS and TileGX

2012-06-19 Thread Maxim Kuvyrkov
On 15/06/2012, at 11:16 AM, Richard Henderson wrote:

> On 2012-06-14 16:06, Maxim Kuvyrkov wrote:
>> 2012-06-15  Maxim Kuvyrkov  
>> 
>>  * emit-rtl.c (need_atomic_barrier_p): New function.
>>  * emit-rtl.h (need_atomic_barrier_p): Declare it.
>>  * config/alpha/alpha.c (alpha_{pre,post}_atomic_barrier): Remove, use
>>  generic version instead.
>>  * config/arm/arm.c (arm_{pre,post}_atomic_barrier): Remove, use
>>  generic version instead.
>>  * config/mips/mips.c (mips_{pre,post}_atomic_barrier_p): Remove, use
>>  generic version instead.
>>  * config/tilegx/tilegx.c, config/tilegx/tilegx-protos.h,
>>  * config/tilegx/sync.md (tilegx_{pre,post}_atomic_barrier): Remove, use
>>  generic version instead.
> 
> 
> Ok.

Since I didn't hear any objections from target maintainers I've checked in this 
patch.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics




C++ PATCH for c++/53651 (ICE with ill-formed use of decltype)

2012-06-19 Thread Jason Merrill

A decltype doesn't have a name.

Tested x86_64-pc-linux-gnu, applying to trunk and 4.7.
commit bab2f5e9e77bd41b91ca6eae34483eb159307519
Author: Jason Merrill 
Date:   Thu Jun 14 17:28:08 2012 -0700

	PR c++/53651
	* name-lookup.c (constructor_name_p): Don't try to look at the
	name of a DECLTYPE_TYPE.

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 0f28820..cc8439c 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -1966,6 +1966,11 @@ constructor_name_p (tree name, tree type)
   if (TREE_CODE (name) != IDENTIFIER_NODE)
 return false;
 
+  /* These don't have names.  */
+  if (TREE_CODE (type) == DECLTYPE_TYPE
+  || TREE_CODE (type) == TYPEOF_TYPE)
+return false;
+
   ctor_name = constructor_name_full (type);
   if (name == ctor_name)
 return true;
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype37.C b/gcc/testsuite/g++.dg/cpp0x/decltype37.C
new file mode 100644
index 000..c885e9a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype37.C
@@ -0,0 +1,14 @@
+// PR c++/53651
+// { dg-do compile { target c++11 } }
+
+template struct wrap { void bar(); };
+
+template auto foo(T* t) -> wrap* { return 0; }
+
+template
+struct holder : decltype(*foo((T*)0)) // { dg-error "class type" }
+{
+using decltype(*foo((T*)0))::bar; // { dg-error "is not a base" }
+};
+
+holder h;


Re: [patch] Remove NO_IMPLICIT_EXTERN_C target macro

2012-06-19 Thread Hans-Peter Nilsson
On Mon, 18 Jun 2012, Steven Bosscher wrote:
> The attached patch removes NO_IMPLICIT_EXTERN_C, and replaces its sole
> user with IMPLICIT_EXTERN_C to avoid the double negations (#ifndef
> NO_IMPLICIT_EXTERN_C, etc.).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?

I saw it wasn't part of this patch so: when and if this
eventually gets in, please don't forget to poison it, see
system.h.

brgds, H-P


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-06-19 Thread Teresa Johnson
Ping.
Teresa

On Fri, May 18, 2012 at 7:21 AM, Teresa Johnson  wrote:
> Ping?
> Teresa
>
> On Fri, May 11, 2012 at 6:11 AM, Teresa Johnson  wrote:
>> Ping?
>> Teresa
>>
>> On Fri, May 4, 2012 at 3:41 PM, Teresa Johnson  wrote:
>>>
>>> On David's suggestion, I have removed the changes that rename niter_desc
>>> to
>>> loop_desc from this patch to focus the patch on the unrolling changes. I
>>> can
>>> submit a cleanup patch to do the renaming as soon as this one goes in.
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu.  Ok for trunk?
>>>
>>> Thanks,
>>> Teresa
>>>
>>> Here is the new description of improvements from the original patch:
>>>
>>> Improved patch based on feedback. Main changes are:
>>>
>>> 1) Improve efficiency by caching loop analysis results in the loop
>>> auxiliary
>>> info structure hanging off the loop structure. Added a new routine,
>>> analyze_loop_insns, to fill in information about the average and total
>>> number
>>> of branches, as well as whether there are any floating point set and call
>>> instructions in the loop. The new routine is invoked when we first create
>>> a
>>> loop's niter_desc struct, and the caller (get_simple_loop_desc) has been
>>> modified to handle creating a niter_desc for the fake outermost loop.
>>>
>>> 2) Improvements to max_unroll_with_branches:
>>> - Treat the fake outermost loop (the procedure body) as we would a hot
>>> outer
>>> loop, i.e. compute the max unroll looking at its nested branches, instead
>>> of
>>> shutting off unrolling when we reach the fake outermost loop.
>>> - Pull the checks previously done in the caller into the routine (e.g.
>>> whether the loop iterates frequently or contains fp instructions).
>>> - Fix a bug in the previous version that sometimes caused overflow in the
>>> new unroll factor.
>>>
>>> 3) Remove float variables, and use integer computation to compute the
>>> average number of branches in the loop.
>>>
>>> 4) Detect more types of floating point computations in the loop by walking
>>> all set instructions, not just single sets.
>>>
>>> 2012-05-04   Teresa Johnson  
>>>
>>>        * doc/invoke.texi: Update the documentation with new params.
>>>        * loop-unroll.c (max_unroll_with_branches): New function.
>>>        (decide_unroll_constant_iterations,
>>> decide_unroll_runtime_iterations):
>>>        Add heuristic to avoid increasing branch mispredicts when
>>> unrolling.
>>>        (decide_peel_simple, decide_unroll_stupid): Retrieve number of
>>>        branches from niter_desc instead of via function that walks loop.
>>>        * loop-iv.c (get_simple_loop_desc): Invoke new analyze_loop_insns
>>>        function, and add guards to enable this function to work for the
>>>        outermost loop.
>>>        * cfgloop.c (insn_has_fp_set, analyze_loop_insns): New functions.
>>>        (num_loop_branches): Remove.
>>>        * cfgloop.h (struct loop_desc): Added new fields to cache
>>> additional
>>>        loop analysis information.
>>>        (num_loop_branches): Remove.
>>>        (analyze_loop_insns): Declare.
>>>        * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
>>>        (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.
>>>
>>> Index: doc/invoke.texi
>>> ===
>>> --- doc/invoke.texi     (revision 187013)
>>> +++ doc/invoke.texi     (working copy)
>>> @@ -8842,6 +8842,12 @@ The maximum number of insns of an unswitched loop.
>>>  @item max-unswitch-level
>>>  The maximum number of branches unswitched in a single loop.
>>>
>>> +@item min-iter-unroll-with-branches
>>> +Minimum iteration count to ignore branch effects when unrolling.
>>> +
>>> +@item unroll-outer-loop-branch-budget
>>> +Maximum number of branches allowed in hot outer loop region after unroll.
>>> +
>>>  @item lim-expensive
>>>  The minimum cost of an expensive expression in the loop invariant motion.
>>>
>>> Index: loop-unroll.c
>>> ===
>>> --- loop-unroll.c       (revision 187013)
>>> +++ loop-unroll.c       (working copy)
>>> @@ -152,6 +152,99 @@ static void combine_var_copies_in_loop_exit (struc
>>>                                             basic_block);
>>>  static rtx get_expansion (struct var_to_expand *);
>>>
>>> +/* Compute the maximum number of times LOOP can be unrolled without
>>> exceeding
>>> +   a branch budget, which can increase branch mispredictions. The number
>>> of
>>> +   branches is computed by weighting each branch with its expected
>>> execution
>>> +   probability through the loop based on profile data. If no profile
>>> feedback
>>> +   data exists, simply return the current NUNROLL factor.  */
>>> +
>>> +static unsigned
>>> +max_unroll_with_branches(struct loop *loop, unsigned nunroll)
>>> +{
>>> +  struct loop *outer;
>>> +  struct niter_desc *outer_desc = 0;
>>> +  int outer_niters = 1;
>>> +  int frequent_iteration_threshold;
>>> +  unsigned branch_budget;
>>> +  s

Re: [PING ARM Patches] PR53447: optimizations of 64bit ALU operation with constant

2012-06-19 Thread Michael Hope
On 18 June 2012 22:17, Carrot Wei  wrote:
> Hi
>
> Could ARM maintainers review following patches?
>
> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00497.html
> 64bit add/sub constants.
>
> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01834.html
> 64bit and with constants.
>
> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01974.html
> 64bit xor with constants.
>
> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00287.html
> 64bit ior with constants.

Hi Carrot.  Out of interest, how do these interact with the 64 bit in
NEON patches that Andrew has been doing?  They seem to touch many of
the same patterns and I'm concerned that they'd cause GCC to prefer
core registers instead of NEON, especially as the constant values you
can use in a vmov are limited.

There's a (in progress) summary of the current state for the standard
C operators here:
 https://wiki.linaro.org/MichaelHope/Sandbox/64BitOperations

-- Michael


RE: [PATCH, ARM] New CPU support for Marvell PJ4 cores

2012-06-19 Thread Yi-Hsiu Hsu
marvell-pj4 is added to BE8_LINK_SPEC.

Modified patch is attached.

Thanks!

B.R.
Yi-Hsiu, Hsu

-Original Message-
From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] 
Sent: Thursday, June 14, 2012 2:19 AM
To: Yi-Hsiu Hsu
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, ARM] New CPU support for Marvell PJ4 cores

On 29 May 2012 10:07, Yi-Hsiu Hsu  wrote:
> Hi,
>
> This patch maintains Marvell PJ4 cores pipeline description.
> Run arm testsuite on arm-linux-gnueabi and no extra regressions are found.
>
>        * config/arm/marvell-pj4.md: New marvell-pj4 pipeline description.
>        * config/arm/arm.c (arm_issue_rate): Add marvell_pj4.
>        * config/arm/arm-cores.def: Add core marvell-pj4.
>        * config/arm/arm-tune.md: Regenerated.
>        * config/arm/arm-tables.opt: Regenerated.
>        * doc/invoke.texi: Added entry for marvell-pj4.

This command line option should also be added to BE8_LINK_SPEC similar
to what's done for the other v7-a cores.

Ok with that change.

regards,
Ramana



>
>
> Thanks!
>
> P.S. I create the patch from revision 187308, but this revision is unable to 
> build successfully, then I apply this patch to revision 187623 and 
> successfully build and pass the testsuite.
>


marvell-pj4-core.patch
Description: marvell-pj4-core.patch


[PR debug/53682] avoid crash in cselib promote_debug_loc

2012-06-19 Thread Alexandre Oliva
When promote_debug_loc was first introduced, it would never be called
with a NULL loc list.  However, because of the strategy of temporarily
resetting loc lists before recursion introduced a few months ago in
alias.c, the earlier assumption no longer holds.

This patch adusts promote_debug_loc to deal with this case.

Ok to install?


for  gcc/ChangeLog
from  Alexandre Oliva  

	PR debug/53682
	* cselib.c (promote_debug_loc): Don't crash on NULL argument.

Index: gcc/cselib.c
===
--- gcc/cselib.c.orig	2012-06-17 22:52:27.740087279 -0300
+++ gcc/cselib.c	2012-06-18 08:55:32.948832112 -0300
@@ -322,7 +322,7 @@ new_elt_loc_list (cselib_val *val, rtx l
 static inline void
 promote_debug_loc (struct elt_loc_list *l)
 {
-  if (l->setting_insn && DEBUG_INSN_P (l->setting_insn)
+  if (l && l->setting_insn && DEBUG_INSN_P (l->setting_insn)
   && (!cselib_current_insn || !DEBUG_INSN_P (cselib_current_insn)))
 {
   n_debug_values--;

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


Re: [PR49888, VTA] don't keep VALUEs bound to modified MEMs

2012-06-19 Thread Alexandre Oliva
On Jun 16, 2012, "H.J. Lu"  wrote:

> If I understand it correctly, the new approach fails to handle push
> properly.

It's actually cselib that didn't deal with push properly, so it thinks
incoming stack arguments may be clobbered by them.  But that's not the
whole story, unfortunately.  I still don't have a complete fix for the
problem, but I have some patches that restore nearly all of the passes.

The first one extends RTX alias analysis so that cselib can recognize
that (mem:SI ARGP) and (mem:SI (plus (and (plus ARGP #-4) #-32) #-4))
don't alias.  Before the patch, we'd go for infinite sized objects upon
AND.

The second introduces an entry-point equivalence between ARGP and SP, so
that SP references in push and stack-align sequences can be
canonicalized to ARGP-based.

The third introduces address canonicalization that uses information in
the dataflow variable set in addition to the static cselib table.  This
is the one I'm still working on, because some expressions still fail to
canonicalize to ARGP although they could.

The fourth removes a now-redundant equivalence from the dynamic table;
the required information is always preserved in the static table.

I've regstrapped (and checked results! :-) all of these on
x86_64-linux-gnu and i686-linux-gnu.  It fixes all visible regressions
in x86_64-linux-gnu, and nearly all on i686-linux-gnu.

May I check these in and keep on working to complete the fix, or should
I revert the original patch and come back only with a patchset that
fixes all debug info regressions?

for  gcc/ChangeLog
from  Alexandre Oliva  

	PR debug/53671
	PR debug/49888
	* alias.c (memrefs_conflict_p): Improve handling of AND for
	alignment.
	
Index: gcc/alias.c
===
--- gcc/alias.c.orig	2012-06-17 22:52:27.551102225 -0300
+++ gcc/alias.c	2012-06-17 22:59:00.674994588 -0300
@@ -2103,17 +2103,31 @@ memrefs_conflict_p (int xsize, rtx x, in
  at least as large as the alignment, assume no other overlap.  */
   if (GET_CODE (x) == AND && CONST_INT_P (XEXP (x, 1)))
 {
-  if (GET_CODE (y) == AND || ysize < -INTVAL (XEXP (x, 1)))
+  HOST_WIDE_INT sc = INTVAL (XEXP (x, 1));
+  unsigned HOST_WIDE_INT uc = sc;
+  if (xsize > 0 && sc < 0 && -uc == (uc & -uc))
+	{
+	  xsize -= sc + 1;
+	  c -= sc;
+	}
+  else if (GET_CODE (y) == AND || ysize < -INTVAL (XEXP (x, 1)))
 	xsize = -1;
   return memrefs_conflict_p (xsize, canon_rtx (XEXP (x, 0)), ysize, y, c);
 }
   if (GET_CODE (y) == AND && CONST_INT_P (XEXP (y, 1)))
 {
+  HOST_WIDE_INT sc = INTVAL (XEXP (y, 1));
+  unsigned HOST_WIDE_INT uc = sc;
+  if (ysize > 0 && sc < 0 && -uc == (uc & -uc))
+	{
+	  ysize -= sc + 1;
+	  c += sc;
+	}
   /* ??? If we are indexing far enough into the array/structure, we
 	 may yet be able to determine that we can not overlap.  But we
 	 also need to that we are far enough from the end not to overlap
 	 a following reference, so we do nothing with that for now.  */
-  if (GET_CODE (x) == AND || xsize < -INTVAL (XEXP (y, 1)))
+  else if (GET_CODE (x) == AND || xsize < -INTVAL (XEXP (y, 1)))
 	ysize = -1;
   return memrefs_conflict_p (xsize, x, ysize, canon_rtx (XEXP (y, 0)), c);
 }
for  gcc/ChangeLog
from  Alexandre Oliva  

	PR debug/53671
	PR debug/49888
	* var-tracking.c (vt_initialize): Record initial offset between
	arg pointer and stack pointer.

Index: gcc/var-tracking.c
===
--- gcc/var-tracking.c.orig	2012-06-17 23:00:45.793675979 -0300
+++ gcc/var-tracking.c	2012-06-17 23:01:02.525351931 -0300
@@ -9507,6 +9507,41 @@ vt_initialize (void)
   valvar_pool = NULL;
 }
 
+  if (MAY_HAVE_DEBUG_INSNS)
+{
+  rtx reg, expr;
+  int ofst;
+  cselib_val *val;
+
+#ifdef FRAME_POINTER_CFA_OFFSET
+  reg = frame_pointer_rtx;
+  ofst = FRAME_POINTER_CFA_OFFSET (current_function_decl);
+#else
+  reg = arg_pointer_rtx;
+  ofst = ARG_POINTER_CFA_OFFSET (current_function_decl);
+#endif
+
+  ofst -= INCOMING_FRAME_SP_OFFSET;
+
+  val = cselib_lookup_from_insn (reg, GET_MODE (reg), 1,
+ VOIDmode, get_insns ());
+  preserve_value (val);
+  cselib_preserve_cfa_base_value (val, REGNO (reg));
+  expr = plus_constant (GET_MODE (stack_pointer_rtx),
+			stack_pointer_rtx, -ofst);
+  cselib_add_permanent_equiv (val, expr, get_insns ());
+
+  if (ofst)
+	{
+	  val = cselib_lookup_from_insn (stack_pointer_rtx,
+	 GET_MODE (stack_pointer_rtx), 1,
+	 VOIDmode, get_insns ());
+	  preserve_value (val);
+	  expr = plus_constant (GET_MODE (reg), reg, ofst);
+	  cselib_add_permanent_equiv (val, expr, get_insns ());
+	}
+}
+
   /* In order to factor out the adjustments made to the stack pointer or to
  the hard frame pointer and thus be able to use DW_OP_fbreg operations
  instead of individual location lists, we're going to rewrite MEMs based
for  gcc/Change

Re: [PATCH 3/3] Handle const_vector in mulv4si3 for pre-sse4.1.

2012-06-19 Thread Uros Bizjak
On Mon, Jun 18, 2012 at 10:06 PM, Richard Henderson  wrote:

>> Please note that you will probably hit PR33329, this is the reason
>> that we expand multiplications after reload. Please see [1] for
>> further explanation. There is gcc.target/i386/pr33329.c test to cover
>> this issue, but it is not effective anymore since the simplification
>> happens at tree level.
>>
>> [1] http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00668.html
>
>
> Well, even with the test case changed s/*2/*12345/ so that the
> test case continues to use a multiply instead of devolving to
> a shift, does not fail.
>
> There have been a lot of changes since 2007; I might hope that
> the underlying bug has been fixed.

Should we also change mul3 and mul3 from
pre-reload splitter to an expander in the same way?

Uros.