date:20150305

Re: [Patch] PR 65315 - Fix alignment of local variables

2015-03-05 Thread Richard Biener

On Wed, Mar 4, 2015 at 8:50 PM, Steve Ellcey  wrote:
> While examining some MIPS code I noticed that GCC did not seem to be
> fully honoring the aligned attribute on some local variables.  I submitted
> PR middle-end/65315 to record the bug and I think I now understand it and
> have a fix.  The problem was that expand_stack_vars seemed to think that
> the first entry in stack_vars_sorted would have the largest alignment but
> while all the variables that had alignment greater then
> MAX_SUPPORTED_STACK_ALIGNMENT would come before all variables whose
> alignment was less than MAX_SUPPORTED_STACK_ALIGNMENT, within the variables
> with the alignment greater than MAX_SUPPORTED_STACK_ALIGNMENT, they
> were sorted by size, not by alignment.
>
> So my fix was to update large_align in expand_stack_vars if needed.
>
> I have verified the fix on the MIPS test case in PR 65315 and am doing a
> regression test now.  OK to checkin if there are no regressions?

It looks like large_align vars are dynamically allocated and thus they
should be sorted as sizeof (void *) I suppose.

Do you have a testcase?

Ok.

Thanks,
Richard.

Richad.

> I wasn't sure how to create a generic test case, I was checking the
> alignment on MIPS by hand by looking for the shift-right/shift-left
> instructions that create an aligned pointer but that obviously doesn't
> work on other architectures.
>
> Steve Ellcey
> sell...@imgtec.com
>
>
> 2015-03-04  Steve Ellcey  
>
> PR middle-end/65315
> * cfgexpand.c (expand_stack_vars): Update large_align to maximum
> needed alignment.
>
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 7dfe1f6..569cd0d 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -973,6 +973,13 @@ expand_stack_vars (bool (*pred) (size_t), struct 
> stack_vars_data *data)
>   i = stack_vars_sorted[si];
>   alignb = stack_vars[i].alignb;
>
> + /* All "large" alignment decls come before all "small" alignment
> +decls, but "large" alignment decls are not sorted based on
> +their alignment.  Increase large_align to track the largest
> +required alignment.  */
> + if ((alignb * BITS_PER_UNIT) > large_align)
> +   large_align = alignb * BITS_PER_UNIT;
> +
>   /* Stop when we get to the first decl with "small" alignment.  */
>   if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
> break;

Re: [patch/committed] PR middle-end/65233 make walk-ssa_copies handle empty PHIs

2015-03-05 Thread Richard Biener

On Thu, Mar 5, 2015 at 1:54 AM, Jan Hubicka  wrote:
>> >
>> > It gets passed the valueize callback now which returns NULL_TREE for
>> > SSA names we can't follow.
>>
>> Btw, for match-and-simplify I had to use that as default for fold_stmt
>> _exactly_ because of the call to fold_stmt from replace_uses_by
>> via merge-blocks from cfgcleanup.  This is because replace-uses-by
>> doesn't have all uses replaced before it folds the stmt!
>>
>> We also have the "weaker" in-place flag.
>>
>> 2015-03-04  Richard Biener  
>>
>> PR middle-end/65233
>> * ipa-polymorphic-call.c: Include tree-ssa-operands.h and
>> tree-into-ssa.h.
>> (walk_ssa_copies): Revert last chage.  Instead do not walk
>> SSA names registered for SSA update.
>
> Maybe include the patch?  It should not be problem to make the function
> to valuelize everything it looks into.

I attached it.

Well, I think for stage1 the fix is to not call fold_stmt from CFG hooks or
CFG cleanup.  Merge-blocks can just demote PHIs to assignments and
leave propagation to followup cleanups (we can of course propagate
virtual operands).

I can try to do it for this stage (I can only find merge-blocks doing this)
as well.  Opinions?

Richard.

> Honza

Re: [PATCH] Fix PR rtl-optimization/65067

2015-03-05 Thread Richard Biener

On Thu, Mar 5, 2015 at 8:10 AM, Bernd Edlinger
 wrote:
> Hi,
>
> on ARM we have a code quality regression, because of the strict volatile
> bitfields handing.  The reason is that the current implementation directly
> jumps to store_fixed_bit_field_1 which emits a sequence of and/or/shift
> expressions.  This turns out to be too complex for combine to figure out
> the possibility to use a "bfi" instruction.
>
> But if -fno-strict-volatile-bitfields is used store_bit_field  can use the
> EP_insv code pattern, which results in "bfi" instructions.
> The only problem is that that store_bit_field is free to use _any_ possible
> access mode.  But if we load the value first in a register, we can safely
> use store_bit_field on the register and move the result back.
>
>
> Boot-Strapped and regression-tested on Cortex-M3.
>
> OK for trunk?

Hmm.  As you also modify the no-strict-volatile-bitfield path I'm not sure
you don't regress the case where EP_insv can work on memory.  I agree
that simplifying the strict-volatile-bitfield path to extract the memory
within strict-volatile-bitfield constraints to a reg and then using the regular
path is a good thing.

Eric?

Thanks,
Richard.

>
> Thanks
> Bernd.
>

Re: [PATCH] Fix another wrong-code bug with -fstrict-volatile-bitfields

2015-03-05 Thread Richard Biener

On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger
 wrote:
> bounced... again, without html.
>
>
> Hi Richard,
>
> while working on another bug in the area of   -fstrict-volatile-bitfields
> I became aware of another example where   -fstrict-volatile-bitfields may 
> generate
> wrong code.   This is reproducible on a !STRICT_ALIGNMENT target like x86_64.
>
> The problem is   that strict_volatile_bitfield_p tries to allow more than 
> necessary
> if !STRICT_ALIGNMENT.Everything works OK on ARM for instance.
>
> If this function returns true, we may later call narrow_bit_field_mem, and
> the check in strict_volatile_bitfield_p should mirror the logic there:
> narrow_bit_field_mem just uses GET_MODE_BITSIZE (mode) and does not
> care about STRICT_ALIGNMENT, and in the end  *new_bitnum + bitsize may
> reach beyond the end of the region. This causes store_fixed_bit_field_1
> to silently fail to generate correct code.

Hmm, but the comment sounds like if using GET_MODE_ALIGNMENT is
more correct (even for !strict-alignment) - if mode is SImode and mode
alignment is 16 (HImode aligned) then we don't need to split the load
if bitnum is 16 and bitsize is 32.

So maybe narrow_bit_field_mem needs to be fixed as well?

Thanks,
Richard.

> The attached patch was   boot-strapped and
> regression-tested on x86_64-linux-gnu.
>
> OK for trunk and 4.9?
>
>
> Thanks
> Bernd.
>

RE: [PATCH] Fix PR rtl-optimization/65067

2015-03-05 Thread Bernd Edlinger

Hi,

On Thu, 5 Mar 2015 09:52:54, Richard Biener wrote:
>
> On Thu, Mar 5, 2015 at 8:10 AM, Bernd Edlinger
>  wrote:
>> Hi,
>>
>> on ARM we have a code quality regression, because of the strict volatile
>> bitfields handing. The reason is that the current implementation directly
>> jumps to store_fixed_bit_field_1 which emits a sequence of and/or/shift
>> expressions. This turns out to be too complex for combine to figure out
>> the possibility to use a "bfi" instruction.
>>
>> But if -fno-strict-volatile-bitfields is used store_bit_field can use the
>> EP_insv code pattern, which results in "bfi" instructions.
>> The only problem is that that store_bit_field is free to use _any_ possible
>> access mode. But if we load the value first in a register, we can safely
>> use store_bit_field on the register and move the result back.
>>
>>
>> Boot-Strapped and regression-tested on Cortex-M3.
>>
>> OK for trunk?
>
> Hmm. As you also modify the no-strict-volatile-bitfield path I'm not sure
> you don't regress the case where EP_insv can work on memory. I agree
> that simplifying the strict-volatile-bitfield path to extract the memory
> within strict-volatile-bitfield constraints to a reg and then using the 
> regular
> path is a good thing.
>

I tried _not_ to touch the no-strict-volatile-bitfield path.
Where did you see that?

Thanks
Bernd.

> Eric?
>
> Thanks,
> Richard.
>
>>
>> Thanks
>> Bernd.
>>

Re: [c-family] Fix -fdump-ada-spec ICEs

2015-03-05 Thread Dominique Dhumieres

Hi Eric,

Following this commit (r221088) testing dump-ada-spec-3.C with

make -k check-g++ RUNTESTFLAGS="dg.exp=other/dump-ada-spec-3.C

generates a lot of *.ads files in the gcc/testsuite/g++ directory
which are not cleaned up after completion.

Any idea about how to do the cleaning?

TIA

Dominique

Re: [PATCH] Fix PR rtl-optimization/65067

2015-03-05 Thread Richard Biener

On Thu, Mar 5, 2015 at 10:03 AM, Bernd Edlinger
 wrote:
> Hi,
>
> On Thu, 5 Mar 2015 09:52:54, Richard Biener wrote:
>>
>> On Thu, Mar 5, 2015 at 8:10 AM, Bernd Edlinger
>>  wrote:
>>> Hi,
>>>
>>> on ARM we have a code quality regression, because of the strict volatile
>>> bitfields handing. The reason is that the current implementation directly
>>> jumps to store_fixed_bit_field_1 which emits a sequence of and/or/shift
>>> expressions. This turns out to be too complex for combine to figure out
>>> the possibility to use a "bfi" instruction.
>>>
>>> But if -fno-strict-volatile-bitfields is used store_bit_field can use the
>>> EP_insv code pattern, which results in "bfi" instructions.
>>> The only problem is that that store_bit_field is free to use _any_ possible
>>> access mode. But if we load the value first in a register, we can safely
>>> use store_bit_field on the register and move the result back.
>>>
>>>
>>> Boot-Strapped and regression-tested on Cortex-M3.
>>>
>>> OK for trunk?
>>
>> Hmm. As you also modify the no-strict-volatile-bitfield path I'm not sure
>> you don't regress the case where EP_insv can work on memory. I agree
>> that simplifying the strict-volatile-bitfield path to extract the memory
>> within strict-volatile-bitfield constraints to a reg and then using the 
>> regular
>> path is a good thing.
>>
>
> I tried _not_ to touch the no-strict-volatile-bitfield path.
> Where did you see that?

You changed store_bit_field - ah, sorry, context missing in the patch.  So yes,
the patch is ok.

Thanks,
Richard.

> Thanks
> Bernd.
>
>> Eric?
>>
>> Thanks,
>> Richard.
>>
>>>
>>> Thanks
>>> Bernd.
>>>
>

[PATCH, stage1] Move insns without introducing new temporaries in loop2_invariant

2015-03-05 Thread Thomas Preud'homme

Note: this is stage1 material.

Currently loop2_invariant pass hoist instructions out of loop by creating a new 
temporary for the destination register of that instruction and leaving there a 
mov from new temporary to old register as shown below:

loop header
start of loop body
//stuff
(set (reg 128) (const_int 0))
//other stuff
end of loop body

becomes:

(set (reg 129) (const_int 0))
loop header
start of loop body
//stuff
(set (reg 128) (reg 128))
//other stuff
end of loop body

This is one of the errors that led to a useless ldr ending up inside a loop 
(PR64616). This patch fix this specific bit (some other bit was fixed in [1]) 
by simply moving the instruction if it's known to be safe. This is decided by 
looking at all the uses of the register set in the instruction and checking 
that (i) they were all dominated by the instruction and (ii) there is no other 
def in the loop that could end up reaching one of the use.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00933.html

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2015-02-16  Thomas Preud'homme  

* dominance.c (nearest_common_dominator_for_set): Fix A_Dominated_by_B
code in comment.
* loop-invariant.c (can_move_invariant_reg): New.
(move_invariant_reg): Call above new function to decide whether
instruction can just be moved, skipping creation of temporary
register.

*** gcc/testsuite/ChangeLog ***

2015-02-16  Thomas Preud'homme  

* gcc.dg/loop-7.c: Run on all targets and check for loop2_invariant
being able to move instructions without introducing new temporary
register.
* gcc.dg/loop-8.c: New test.

diff --git a/gcc/dominance.c b/gcc/dominance.c
index 33d4ae4..09c8c90 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -982,7 +982,7 @@ nearest_common_dominator_for_set (enum cdi_direction dir, 
bitmap blocks)
 
A_Dominated_by_B (node A, node B)
{
- return DFS_Number_In(A) >= DFS_Number_In(A)
+ return DFS_Number_In(A) >= DFS_Number_In(B)
 && DFS_Number_Out (A) <= DFS_Number_Out(B);
}  */
 
diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index f79b497..ab2a45c 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1512,6 +1512,99 @@ replace_uses (struct invariant *inv, rtx reg, bool 
in_group)
   return 1;
 }
 
+/* Whether invariant INV setting REG can be moved out of LOOP, at the end of
+   the block preceding its header.  */
+
+static bool
+can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx reg)
+{
+  df_ref def, use;
+  bool ret = false;
+  unsigned int i, dest_regno, defs_in_loop_count = 0;
+  rtx_insn *insn = inv->insn;
+  bitmap may_exit, has_exit, always_executed;
+  basic_block *body, bb = BLOCK_FOR_INSN (inv->insn);
+
+  /* We ignore hard register and memory access for cost and complexity reasons.
+ Hard register are few at this stage and expensive to consider as they
+ require building a separate data flow.  Memory access would require using
+ df_simulate_* and can_move_insns_across functions and is more complex.  */
+  if (!REG_P (reg) || HARD_REGISTER_P (reg))
+return false;
+
+  /* Check whether the set is always executed.  We could omit this condition if
+ we know that the register is unused outside of the loop, but it does not
+ seem worth finding out.  */
+  may_exit = BITMAP_ALLOC (NULL);
+  has_exit = BITMAP_ALLOC (NULL);
+  always_executed = BITMAP_ALLOC (NULL);
+  body = get_loop_body_in_dom_order (loop);
+  find_exits (loop, body, may_exit, has_exit);
+  compute_always_reached (loop, body, has_exit, always_executed);
+  /* Find bit position for basic block bb.  */
+  for (i = 0; i < loop->num_nodes && body[i] != bb; i++);
+  if (!bitmap_bit_p (always_executed, i))
+goto cleanup;
+
+  /* Check that all uses reached by the def in insn would still be reached
+ it.  */
+  dest_regno = REGNO (reg);
+  for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use))
+{
+  rtx ref;
+  basic_block use_bb;
+
+  ref = DF_REF_INSN (use);
+  use_bb = BLOCK_FOR_INSN (ref);
+
+  /* Ignore instruction considered for moving.  */
+  if (ref == insn)
+   continue;
+
+  /* Don't consider uses outside loop.  */
+  if (!flow_bb_inside_loop_p (loop, use_bb))
+   continue;
+
+  /* Don't move if a use is not dominated by def in insn.  */
+  if (use_bb == bb && DF_INSN_LUID (insn) > DF_INSN_LUID (ref))
+   goto cleanup;
+  if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb))
+   goto cleanup;
+
+  /* Check for other defs.  Any other def in the loop might reach a use
+currently reached by the def in insn.  */
+  if (!defs_in_loop_count)
+   {
+ for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = DF_REF_NEXT_REG 
(def))
+   {
+ basic_block def_bb = BLOCK_FOR_INSN (DF_REF_INSN (def));
+
+ /* Defs in exit block cannot reach

[Committed] S/390: var-expand1 use default values for peel/unroll limits

2015-03-05 Thread Andreas Krebbel

Hi,

with -march=z10 we use much higher values for peel and unroll limits.
This makes the loop in the testcase to disappear on tree level
already.  With the patch these values are set back to the default
values making the testcase to pass again.

Committed to mainline

Bye,

-Andreas-

2015-03-05  Andreas Krebbel  

* gcc.dg/var-expand1.c: Force max-completely-peel-times and
max-unroll-times back to defaults for s390.

diff --git a/gcc/testsuite/gcc.dg/var-expand1.c 
b/gcc/testsuite/gcc.dg/var-expand1.c
index 7de4cfb..fb039d3 100644
--- a/gcc/testsuite/gcc.dg/var-expand1.c
+++ b/gcc/testsuite/gcc.dg/var-expand1.c
@@ -3,6 +3,7 @@
targets, where each addition is a library call.  */
 /* { dg-require-effective-target hard_float } */
 /* { dg-options "-O2 -funroll-loops --fast-math 
-fvariable-expansion-in-unroller -fdump-rtl-loop2_unroll" } */
+/* { dg-additional-options "--param max-completely-peel-times=16  --param 
max-unroll-times=8" { target s390*-*-* } } */
 
 extern void abort (void);

[Committed] S/390: xfail ssa-dom-cse-2

2015-03-05 Thread Andreas Krebbel

Hi,

the initializer value in that testcase ends up in literal pool. As
described in the testcase the optimization does currently not work in
that situation.

Committed to mainline.

Bye,

-Andreas-

2015-03-05  Andreas Krebbel  

* gcc.dg/tree-ssa/ssa-dom-cse-2.c:

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index 867bfb2..df1b763 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -20,5 +20,5 @@ foo ()
 /* See PR63679 and PR64159, if the target forces the initializer to memory then
DOM is not able to perform this optimization.  */
 
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* 
alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* 
alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */

[PATCH][ARM][testsuite] Fix FAIL: gcc.target/arm/macro_defs0.c and macro_defs1.c when -marm forced

2015-03-05 Thread Mantas Mikaitis


Hello,

Tests gcc.target/arm/macro_defs0.c and gcc.target/arm/macro_defs1.c fail 
in multilib which forces -marm as pointed out in this message: 
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00483.html .


This patch will cause these tests to be classified as unsupported rather 
than FAIL.


Ok for trunk?

Kind regards,
Mantas M.

2015-03-05  Mantas Mikaitis  

* gcc.target/arm/macro_defs0.c: added directive to skip
test if -marm is present.
* gcc.target/arm/macro_defs1.c: added directive to skip
test if -marm is present.diff --git a/gcc/testsuite/gcc.target/arm/macro_defs0.c b/gcc/testsuite/gcc.target/arm/macro_defs0.c
index 962ff03..684d49f 100644
--- a/gcc/testsuite/gcc.target/arm/macro_defs0.c
+++ b/gcc/testsuite/gcc.target/arm/macro_defs0.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { "-march=armv7-m" } } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" } { "" } } */
 /* { dg-options "-march=armv7-m -mcpu=cortex-m3 -mfloat-abi=soft -mthumb" } */
 
 #ifdef __ARM_FP
diff --git a/gcc/testsuite/gcc.target/arm/macro_defs1.c b/gcc/testsuite/gcc.target/arm/macro_defs1.c
index d5423c7..4cc9ae6 100644
--- a/gcc/testsuite/gcc.target/arm/macro_defs1.c
+++ b/gcc/testsuite/gcc.target/arm/macro_defs1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { "-march=armv6-m" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" } { "" } } */
 /* { dg-options "-march=armv6-m -mthumb" } */
 
 #ifdef __ARM_NEON_FP

[PATCH] [AVX512F] Add scatter support for vectorizer

2015-03-05 Thread Petr Murzin

Hello,
This patch adds scatter support for vectorizer (for AVX512F
instructions). Please have a look. Is it ok for stage 1?

2015-03-05  Andrey Turetskiy  

* config/i386/i386-builtin-types.def
(VOID_PFLOAT_HI_V8DI_V16SF_INT): New.
(VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto.
(VOID_PINT_HI_V8DI_V16SI_INT): Ditto.
(VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto.
* config/i386/i386.c
(ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF,
IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
IX86_BUILTIN_SCATTERALTDIV16SI.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df,
__builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di,
__builtin_ia32_scatteraltdiv8si.
(ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF,
IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI,
IX86_BUILTIN_SCATTERALTDIV16SI.
(ix86_vectorize_builtin_scatter): New.
(ix86_initialize_bounds):
(TARGET_VECTORIZE_BUILTIN_SCATTER): Ditto.
* doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): Ditto.
* doc/tm.texi: Regenerate.
* target.def: Add scatter builtin.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Add new
checkings for STMT_VINFO_SCATTER_P.
(vect_check_gather): Rename to ...
(vect_check_gather_scatter): this and enhance number of arguments.
(vect_analyze_data_ref_access): Update comment and returnable values.
(vect_analyze_data_refs): Add maybe_scatter and new checking for it
accordingly.
* tree-vectorizer.h (STMT_VINFO_SCATTER_P(S)): Define.
(vect_check_gather): Rename to ...
(vect_check_gather_scatter): this.
* tree-vect-stmts.c: Ditto.
(vectorizable_store): Add checkings for STMT_VINFO_SCATTER_P.

2015-03-05  Andrey Turetskiy  

testsuite/

* gcc.target/i386/avx512f-scatter-1.c: New.
* gcc.target/i386/avx512f-scatter-2.c: Ditto.
* gcc.target/i386/avx512f-scatter-3.c: Ditto.
* gcc.target/i386/avx512f-scatter-4.c: Ditto.
* gcc.target/i386/avx512f-scatter-5.c: Ditto.

Thanks,
Petr


scatter_patch
Description: Binary data

[R220456][4.8] Backport the patch which fixes __ARM_FP & __ARM_NEON_FP predefines

2015-03-05 Thread Mantas Mikaitis


Hello,

This is a backport for gcc-4_8-branch of the patch " [PATCH][ARM] 
__ARM_FP & __ARM_NEON_FP defined when -march=armv7-m" posted in: 
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00250.html


arm-none-linux-gnueabi/hf tested without new regressions.

OK for gcc-4_8-branch?

Kind regards,
Mantas M.

gcc/ChangeLog:

2015-02-17  Mantas Mikaitis  

* config/arm/arm.h (TARGET_NEON_FP): Removed conditional
definition, define to zero if !TARGET_NEON.
*(TARGET_ARM_FP): Added !TARGET_SOFT_FLOAT into the conditional
definition gcc/testsuite/ChangeLog:

2015-02-17  Mantas Mikaitis  

* gcc.target/arm/macro_defs0.c: New test.
* gcc.target/arm/macro_defs1.c: New test.
* gcc.target/arm/macro_defs2.c: New test.
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index b4947cd..03a63c1 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2293,17 +2293,17 @@ extern int making_const_table;
point types.  Where bit 1 indicates 16-bit support, bit 2 indicates
32-bit support, bit 3 indicates 64-bit support.  */
 #define TARGET_ARM_FP			\
-  (TARGET_VFP_SINGLE ? 4		\
-  		 : (TARGET_VFP_DOUBLE ? (TARGET_FP16 ? 14 : 12) : 0))
+  (!TARGET_SOFT_FLOAT ? (TARGET_VFP_SINGLE ? 4		\
+			: (TARGET_VFP_DOUBLE ? (TARGET_FP16 ? 14 : 12) : 0)) \
+		  : 0)
 
 
 /* Set as a bit mask indicating the available widths of floating point
types for hardware NEON floating point.  This is the same as
TARGET_ARM_FP without the 64-bit bit set.  */
-#ifdef TARGET_NEON
-#define TARGET_NEON_FP		\
-  (TARGET_ARM_FP & (0xff ^ 0x08))
-#endif
+#define TARGET_NEON_FP \
+  (TARGET_NEON ? (TARGET_ARM_FP & (0xff ^ 0x08)) \
+	   : 0)
 
 /* The maximum number of parallel loads or stores we support in an ldm/stm
instruction.  */
diff --git a/gcc/testsuite/gcc.target/arm/macro_defs0.c b/gcc/testsuite/gcc.target/arm/macro_defs0.c
new file mode 100644
index 000..198243e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/macro_defs0.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options"
+   { *-*-* } { "-march=*" } {"-march=armv7-m"} } */
+/* { dg-skip-if "avoid conflicting multilib options"
+   { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
+/* { dg-options "-march=armv7-m -mcpu=cortex-m3 -mfloat-abi=soft -mthumb" } */
+
+#ifdef __ARM_FP
+#error __ARM_FP should not be defined
+#endif
+
+#ifdef __ARM_NEON_FP
+#error __ARM_NEON_FP should not be defined
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/macro_defs1.c b/gcc/testsuite/gcc.target/arm/macro_defs1.c
new file mode 100644
index 000..075b71b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/macro_defs1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options"
+   { *-*-* } { "-march=*" } { "-march=armv6-m" } } */
+/* { dg-options "-march=armv6-m -mthumb" } */
+
+#ifdef __ARM_NEON_FP
+#error __ARM_NEON_FP should not be defined
+#endif
+
diff --git a/gcc/testsuite/gcc.target/arm/macro_defs2.c b/gcc/testsuite/gcc.target/arm/macro_defs2.c
new file mode 100644
index 000..8a8851d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/macro_defs2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=cortex-a15 -mfpu=neon-vfpv4" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+
+#ifndef __ARM_NEON_FP
+#error  __ARM_NEON_FP is not defined but should be
+#endif
+
+#ifndef __ARM_FP
+#error  __ARM_FP is not defined but should be
+#endif
+
+

RE: [PATCH] Fix another wrong-code bug with -fstrict-volatile-bitfields

2015-03-05 Thread Bernd Edlinger

Hi,

On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote:
>
> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger
>  wrote:
>> bounced... again, without html.
>>
>>
>> Hi Richard,
>>
>> while working on another bug in the area of -fstrict-volatile-bitfields
>> I became aware of another example where -fstrict-volatile-bitfields may 
>> generate
>> wrong code. This is reproducible on a !STRICT_ALIGNMENT target like x86_64.
>>
>> The problem is that strict_volatile_bitfield_p tries to allow more than 
>> necessary
>> if !STRICT_ALIGNMENT. Everything works OK on ARM for instance.
>>
>> If this function returns true, we may later call narrow_bit_field_mem, and
>> the check in strict_volatile_bitfield_p should mirror the logic there:
>> narrow_bit_field_mem just uses GET_MODE_BITSIZE (mode) and does not
>> care about STRICT_ALIGNMENT, and in the end *new_bitnum + bitsize may
>> reach beyond the end of the region. This causes store_fixed_bit_field_1
>> to silently fail to generate correct code.
>
> Hmm, but the comment sounds like if using GET_MODE_ALIGNMENT is
> more correct (even for !strict-alignment) - if mode is SImode and mode
> alignment is 16 (HImode aligned) then we don't need to split the load
> if bitnum is 16 and bitsize is 32.
>
> So maybe narrow_bit_field_mem needs to be fixed as well?
>

I'd rather not touch that function

In the whole expmed.c the only place where  GET_MODE_ALIGNMENT
is used, is in simple_mem_bitfield_p but only if SLOW_UNALIGNED_ACCESS
returns 1, which is only the case on few targets.
Do you know any targets, where GET_MODE_ALIGNMENT may be less than
GET_MODE_BITSIZE?

Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure.

Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode)
Because the strict volatile bitfields handling will inevitably try to use
the fieldmode to access the memory.

Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode),
because it is easier to explain when some one asks, when we guarantee the 
semantics
of strict volatile bitfield?

Probably there is already something in the logic in expr.c that prevents these 
cases,
because otherwise it would be way to easy to find an example for unaligned 
accesses
to unaligned memory on STRICT_ALIGNMENT targets.

Ok, what would you think about this variant?

--- expmed.c.jj    2015-01-16 11:20:40.0 +0100
+++ expmed.c    2015-03-05 11:50:09.40016 +0100
@@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns
 return false;

   /* Check for cases of unaligned fields that must be split.  */
-  if (bitnum % BITS_PER_UNIT + bitsize> modesize
-  || (STRICT_ALIGNMENT
-      && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize))
+  if (bitnum % modesize + bitsize> modesize)
+    return false;
+
+  /* Check that the memory is sufficiently aligned.  */
+  if (MEM_ALIGN (op0) < modesize)
 return false;

   /* Check for cases where the C++ memory model applies.  */

Trying to use an atomic access to a device register is pointless if the
memory context is not aligned to the MODE_BITSIZE, that has nothing
to do with MODE_ALIGNMENT, right?

Thanks
Bernd.

Re: [PATCH, stage1] Move insns without introducing new temporaries in loop2_invariant

2015-03-05 Thread Richard Biener

On Thu, Mar 5, 2015 at 10:53 AM, Thomas Preud'homme
 wrote:
> Note: this is stage1 material.
>
> Currently loop2_invariant pass hoist instructions out of loop by creating a 
> new temporary for the destination register of that instruction and leaving 
> there a mov from new temporary to old register as shown below:
>
> loop header
> start of loop body
> //stuff
> (set (reg 128) (const_int 0))
> //other stuff
> end of loop body
>
> becomes:
>
> (set (reg 129) (const_int 0))
> loop header
> start of loop body
> //stuff
> (set (reg 128) (reg 128))
> //other stuff
> end of loop body
>
> This is one of the errors that led to a useless ldr ending up inside a loop 
> (PR64616). This patch fix this specific bit (some other bit was fixed in [1]) 
> by simply moving the instruction if it's known to be safe. This is decided by 
> looking at all the uses of the register set in the instruction and checking 
> that (i) they were all dominated by the instruction and (ii) there is no 
> other def in the loop that could end up reaching one of the use.

Why doesn't copy-propagation clean this up?  It's run after loop2.

Richard.

> [1] https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00933.html
>
> ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2015-02-16  Thomas Preud'homme  
>
> * dominance.c (nearest_common_dominator_for_set): Fix A_Dominated_by_B
> code in comment.
> * loop-invariant.c (can_move_invariant_reg): New.
> (move_invariant_reg): Call above new function to decide whether
> instruction can just be moved, skipping creation of temporary
> register.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2015-02-16  Thomas Preud'homme  
>
> * gcc.dg/loop-7.c: Run on all targets and check for loop2_invariant
> being able to move instructions without introducing new temporary
> register.
> * gcc.dg/loop-8.c: New test.
>
> diff --git a/gcc/dominance.c b/gcc/dominance.c
> index 33d4ae4..09c8c90 100644
> --- a/gcc/dominance.c
> +++ b/gcc/dominance.c
> @@ -982,7 +982,7 @@ nearest_common_dominator_for_set (enum cdi_direction dir, 
> bitmap blocks)
>
> A_Dominated_by_B (node A, node B)
> {
> - return DFS_Number_In(A) >= DFS_Number_In(A)
> + return DFS_Number_In(A) >= DFS_Number_In(B)
>  && DFS_Number_Out (A) <= DFS_Number_Out(B);
> }  */
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index f79b497..ab2a45c 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1512,6 +1512,99 @@ replace_uses (struct invariant *inv, rtx reg, bool 
> in_group)
>return 1;
>  }
>
> +/* Whether invariant INV setting REG can be moved out of LOOP, at the end of
> +   the block preceding its header.  */
> +
> +static bool
> +can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx reg)
> +{
> +  df_ref def, use;
> +  bool ret = false;
> +  unsigned int i, dest_regno, defs_in_loop_count = 0;
> +  rtx_insn *insn = inv->insn;
> +  bitmap may_exit, has_exit, always_executed;
> +  basic_block *body, bb = BLOCK_FOR_INSN (inv->insn);
> +
> +  /* We ignore hard register and memory access for cost and complexity 
> reasons.
> + Hard register are few at this stage and expensive to consider as they
> + require building a separate data flow.  Memory access would require 
> using
> + df_simulate_* and can_move_insns_across functions and is more complex.  
> */
> +  if (!REG_P (reg) || HARD_REGISTER_P (reg))
> +return false;
> +
> +  /* Check whether the set is always executed.  We could omit this condition 
> if
> + we know that the register is unused outside of the loop, but it does not
> + seem worth finding out.  */
> +  may_exit = BITMAP_ALLOC (NULL);
> +  has_exit = BITMAP_ALLOC (NULL);
> +  always_executed = BITMAP_ALLOC (NULL);
> +  body = get_loop_body_in_dom_order (loop);
> +  find_exits (loop, body, may_exit, has_exit);
> +  compute_always_reached (loop, body, has_exit, always_executed);
> +  /* Find bit position for basic block bb.  */
> +  for (i = 0; i < loop->num_nodes && body[i] != bb; i++);
> +  if (!bitmap_bit_p (always_executed, i))
> +goto cleanup;
> +
> +  /* Check that all uses reached by the def in insn would still be reached
> + it.  */
> +  dest_regno = REGNO (reg);
> +  for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use))
> +{
> +  rtx ref;
> +  basic_block use_bb;
> +
> +  ref = DF_REF_INSN (use);
> +  use_bb = BLOCK_FOR_INSN (ref);
> +
> +  /* Ignore instruction considered for moving.  */
> +  if (ref == insn)
> +   continue;
> +
> +  /* Don't consider uses outside loop.  */
> +  if (!flow_bb_inside_loop_p (loop, use_bb))
> +   continue;
> +
> +  /* Don't move if a use is not dominated by def in insn.  */
> +  if (use_bb == bb && DF_INSN_LUID (insn) > DF_INSN_LUID (ref))
> +   goto cleanup;
> +  if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb))
> +   goto clea

Re: [PATCH] Fix another wrong-code bug with -fstrict-volatile-bitfields

2015-03-05 Thread Richard Biener

On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger
 wrote:
> Hi,
>
> On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote:
>>
>> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger
>>  wrote:
>>> bounced... again, without html.
>>>
>>>
>>> Hi Richard,
>>>
>>> while working on another bug in the area of -fstrict-volatile-bitfields
>>> I became aware of another example where -fstrict-volatile-bitfields may 
>>> generate
>>> wrong code. This is reproducible on a !STRICT_ALIGNMENT target like x86_64.
>>>
>>> The problem is that strict_volatile_bitfield_p tries to allow more than 
>>> necessary
>>> if !STRICT_ALIGNMENT. Everything works OK on ARM for instance.
>>>
>>> If this function returns true, we may later call narrow_bit_field_mem, and
>>> the check in strict_volatile_bitfield_p should mirror the logic there:
>>> narrow_bit_field_mem just uses GET_MODE_BITSIZE (mode) and does not
>>> care about STRICT_ALIGNMENT, and in the end *new_bitnum + bitsize may
>>> reach beyond the end of the region. This causes store_fixed_bit_field_1
>>> to silently fail to generate correct code.
>>
>> Hmm, but the comment sounds like if using GET_MODE_ALIGNMENT is
>> more correct (even for !strict-alignment) - if mode is SImode and mode
>> alignment is 16 (HImode aligned) then we don't need to split the load
>> if bitnum is 16 and bitsize is 32.
>>
>> So maybe narrow_bit_field_mem needs to be fixed as well?
>>
>
> I'd rather not touch that function
>
> In the whole expmed.c the only place where  GET_MODE_ALIGNMENT
> is used, is in simple_mem_bitfield_p but only if SLOW_UNALIGNED_ACCESS
> returns 1, which is only the case on few targets.
> Do you know any targets, where GET_MODE_ALIGNMENT may be less than
> GET_MODE_BITSIZE?

DImode on i?86, I suppose any mode on targets like AVR.

> Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure.
>
> Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode)
> Because the strict volatile bitfields handling will inevitably try to use
> the fieldmode to access the memory.
>
> Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode),
> because it is easier to explain when some one asks, when we guarantee the 
> semantics
> of strict volatile bitfield?

But on non-strict-align targets you can even for 1-byte aligned MEMs
access an SImode field directly.  So the old code looks correct to me
here and the fix needs to be done somewhere else.

> Probably there is already something in the logic in expr.c that prevents 
> these cases,
> because otherwise it would be way to easy to find an example for unaligned 
> accesses
> to unaligned memory on STRICT_ALIGNMENT targets.
>
>
> Ok, what would you think about this variant?
>
> --- expmed.c.jj2015-01-16 11:20:40.0 +0100
> +++ expmed.c2015-03-05 11:50:09.40016 +0100
> @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns
>  return false;
>
>/* Check for cases of unaligned fields that must be split.  */
> -  if (bitnum % BITS_PER_UNIT + bitsize> modesize
> -  || (STRICT_ALIGNMENT
> -  && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize))
> +  if (bitnum % modesize + bitsize> modesize)
> +return false;
> +
> +  /* Check that the memory is sufficiently aligned.  */
> +  if (MEM_ALIGN (op0) < modesize)

I think that only applies to strict-align targets and then only for
GET_MODE_ALIGNMENT (modesize).  And of course what matters
is the alignment at bitnum - even though op0 may be not sufficiently
aligned it may have known misalignment so that op0 + bitnum is
sufficiently aligned.

Testcases would need to annotate structs with packed/aligned attributes
to get at these cases.

For the testcase included in the patch, what does the patch end up doing?
Not going the strict-volatile bitfield expansion path?  That looks unnecessary
on !strict-alignment targets but resonable on strict-align targets where the
access would need to be splitted.  So, why does it end up being splitted
on !strict-align targets?

Richard.


>  return false;
>
>/* Check for cases where the C++ memory model applies.  */
>
>
> Trying to use an atomic access to a device register is pointless if the
> memory context is not aligned to the MODE_BITSIZE, that has nothing
> to do with MODE_ALIGNMENT, right?
>
>
> Thanks
> Bernd.
>
>

[PATCH] S390: Hotpatching fixes.

2015-03-05 Thread Dominik Vogt

S390: Hotpatching fixes.

 * Properly align function labels with -mhotpatch and add test cases.
 * Include the nops after the function label in the area covered by cfi and
   debug information.
 * Correct a typo in the documentation.
 * Fix formatting in the generated 6-byte-NOP and adapt the test cases.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From 2d42c989a83fac102294ebdff6e68ca4bd571915 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 23 Feb 2015 13:48:26 +0100
Subject: [PATCH] S390: Hotpatching fixes.

 * Properly align function labels with -mhotpatch and add test cases.
 * Include the nops after the function label in the area covered by cfi and
   debug information.
 * Correct a typo in the documentation.
 * Fix formatting in the generated 6-byte-NOP and adapt the test cases.
---
 gcc/config/s390/s390.c  | 65 ++---
 gcc/config/s390/s390.md | 28 +
 gcc/doc/invoke.texi |  4 +-
 gcc/testsuite/gcc.target/s390/hotpatch-1.c  |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-10.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-11.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-12.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-13.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-14.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-15.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-16.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-17.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-18.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-19.c |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-2.c  |  3 +-
 gcc/testsuite/gcc.target/s390/hotpatch-21.c | 14 +++
 gcc/testsuite/gcc.target/s390/hotpatch-22.c | 14 +++
 gcc/testsuite/gcc.target/s390/hotpatch-23.c | 14 +++
 gcc/testsuite/gcc.target/s390/hotpatch-24.c | 14 +++
 gcc/testsuite/gcc.target/s390/hotpatch-3.c  |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-4.c  |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-5.c  |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-6.c  |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-7.c  |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-8.c  |  2 +-
 gcc/testsuite/gcc.target/s390/hotpatch-9.c  |  2 +-
 26 files changed, 146 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-21.c
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-22.c
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-23.c
 create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-24.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 1924f2a..bac3555 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -5295,6 +5295,7 @@ s390_asm_output_function_label (FILE *asm_out_file, const char *fname,
 
   if (hotpatch_p)
 {
+  int function_alignment;
   int i;
 
   /* Add a trampoline code area before the function label and initialize it
@@ -5308,34 +5309,13 @@ s390_asm_output_function_label (FILE *asm_out_file, const char *fname,
 	 stored directly before the label without crossing a cacheline
 	 boundary.  All this is necessary to make sure the trampoline code can
 	 be changed atomically.  */
+  function_alignment = MAX (8, DECL_ALIGN (decl) / BITS_PER_UNIT);
+  if (! DECL_USER_ALIGN (decl))
+	function_alignment = MAX (function_alignment, align_functions);
+  ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (function_alignment));
 }
 
   ASM_OUTPUT_LABEL (asm_out_file, fname);
-
-  /* Output a series of NOPs after the function label.  */
-  if (hotpatch_p)
-{
-  while (hw_after > 0)
-	{
-	  if (hw_after >= 3 && TARGET_CPU_ZARCH)
-	{
-	  asm_fprintf (asm_out_file, "\tbrcl\t\t0,0\n");
-	  hw_after -= 3;
-	}
-	  else if (hw_after >= 2)
-	{
-	  gcc_assert (hw_after == 2 || !TARGET_CPU_ZARCH);
-	  asm_fprintf (asm_out_file, "\tnop\t0\n");
-	  hw_after -= 2;
-	}
-	  else
-	{
-	  gcc_assert (hw_after == 1);
-	  asm_fprintf (asm_out_file, "\tnopr\t%%r7\n");
-	  hw_after -= 1;
-	}
-	}
-}
 }
 
 /* Output machine-dependent UNSPECs occurring in address constant X
@@ -11368,6 +11348,7 @@ static void
 s390_reorg (void)
 {
   bool pool_overflow = false;
+  int hw_before, hw_after;
 
   /* Make sure all splits have been performed; splits after
  machine_dependent_reorg might confuse insn length counts.  */
@@ -11503,6 +11484,40 @@ s390_reorg (void)
   if (insn_added_p)
 	shorten_branches (get_insns ());
 }
+
+  s390_function_num_hotpatch_hw (current_function_decl, &hw_before, &hw_after);
+  if (hw_after > 0)
+{
+  rtx_insn *insn;
+
+  /* Inject nops for hotpatching. */
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+	{
+	  if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
+	break;
+	}
+  gcc_assert (insn);
+  /* Output a series of NOPs after the NOTE_INSN_FUNCTION_BEG.  */
+  while (hw_after > 0)
+	{

[PATCH] [ARM] Fix widen-sum pattern in neon.md.

2015-03-05 Thread Xingxing Pan


Hi,

The expanding of widen-sum pattern always fails. The vectorizer expects 
the operands to have the same size, while the current implementation of 
widen-sum pattern dose not conform to this.


This patch implements the widen-sum pattern with vpadal. Change the 
vaddw pattern to anonymous. Add widen-sum test cases for neon.


--
Regards,
Xingxing
commit 62637f371a3329ff56644526bc5dbf9356cbdd6c
Author: Xingxing Pan 
Date:   Wed Feb 25 16:44:25 2015 +0800

Fix widen-sum pattern in neon.md.

2015-03-05  Xingxing Pan  

config/arm/
* iterators.md:
(VWSD): New define_mode_iterator.
(V_widen_sum_d): New define_mode_attr.
* neon.md
(widen_ssum3): Redefined.
(widen_usum3): Ditto.
(neon_svaddw3): New anonymous define_insn.
(neon_uvaddw3): Ditto.
testsuite/gcc.target/arm/neon/
* vect-widen-sum-char2short-s-d.c: New file.
* vect-widen-sum-char2short-s.c: Ditto.
* vect-widen-sum-char2short-u-d.c: Ditto.
* vect-widen-sum-char2short-u.c: Ditto.
* vect-widen-sum-short2int-s-d.c: Ditto.
* vect-widen-sum-short2int-s.c: Ditto.
* vect-widen-sum-short2int-u-d.c: Ditto.
* vect-widen-sum-short2int-u.c: Ditto.
testsuite/lib/
* target-supports.exp:
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Return 1 for ARM NEON.
(check_effective_target_vect_widen_sum_hi_to_si): Ditto.
(check_effective_target_vect_widen_sum_qi_to_hi): Ditto.

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index f7f8ab7..4ba5901 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -95,6 +95,9 @@
 ;; Widenable modes.
 (define_mode_iterator VW [V8QI V4HI V2SI])
 
+;; Widenable modes. Used by widen sum.
+(define_mode_iterator VWSD [V8QI V4HI V16QI V8HI])
+
 ;; Narrowable modes.
 (define_mode_iterator VN [V8HI V4SI V2DI])
 
@@ -558,6 +561,11 @@
 ;; Widen. Result is half the number of elements, but widened to double-width.
 (define_mode_attr V_unpack   [(V16QI "V8HI") (V8HI "V4SI") (V4SI "V2DI")])
 
+;; Widen. Result is half the number of elements, but widened to double-width.
+;; Used by widen sum.
+(define_mode_attr V_widen_sum_d [(V8QI "V4HI") (V4HI "V2SI")
+ (V16QI "V8HI") (V8HI "V4SI")])
+
 ;; Conditions to be used in extenddi patterns.
 (define_mode_attr qhs_zextenddi_cond [(SI "") (HI "&& arm_arch6") (QI "")])
 (define_mode_attr qhs_sextenddi_cond [(SI "") (HI "&& arm_arch6")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 63c327e..6cac36d 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,7 +1174,31 @@
 
 ;; Widening operations
 
-(define_insn "widen_ssum3"
+(define_expand "widen_usum3"
+ [(match_operand: 0 "s_register_operand" "")
+  (match_operand:VWSD 1 "s_register_operand" "")
+  (match_operand: 2 "s_register_operand" "")]
+  "TARGET_NEON"
+  {
+emit_move_insn(operands[0], operands[2]);
+emit_insn (gen_neon_vpadalu (operands[0], operands[0], operands[1]));
+DONE;
+  }
+)
+
+(define_expand "widen_ssum3"
+ [(match_operand: 0 "s_register_operand" "")
+  (match_operand:VWSD 1 "s_register_operand" "")
+  (match_operand: 2 "s_register_operand" "")]
+  "TARGET_NEON"
+  {
+emit_move_insn(operands[0], operands[2]);
+emit_insn (gen_neon_vpadals (operands[0], operands[0], operands[1]));
+DONE;
+  }
+)
+
+(define_insn "*neon_svaddw3"
   [(set (match_operand: 0 "s_register_operand" "=w")
 	(plus: (sign_extend:
 			  (match_operand:VW 1 "s_register_operand" "%w"))
@@ -1184,7 +1208,7 @@
   [(set_attr "type" "neon_add_widen")]
 )
 
-(define_insn "widen_usum3"
+(define_insn "*neon_uvaddw3"
   [(set (match_operand: 0 "s_register_operand" "=w")
 	(plus: (zero_extend:
 			  (match_operand:VW 1 "s_register_operand" "%w"))
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-s-d.c b/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-s-d.c
new file mode 100644
index 000..c81c325
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-s-d.c
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -mvectorize-with-neon-double -fdump-tree-vect-details -fdump-rtl-expand" } */
+/* { dg-add-options arm_neon } */
+
+/* { dg-final { scan-tree-dump-times "pattern recognized.*w\\\+" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+/* { dg-final { scan-rtl-dump-times "UNSPEC_VPADAL" 1 "expand" { xfail *-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
+
+#include 
+
+typedef signed char STYPE1;
+typedef signed short STYPE2;
+
+#define N 128
+STYPE1 sdata[N];
+
+volatile int y = 0;
+
+__attribute__ ((noinline)) int
+ssum ()
+{
+  int i;
+  STYPE2 sum = 0;
+  STYPE2 check_sum = 0;
+
+  /* widenning sum: sum chars into short.
+
+ Like gcc.dg/vect/vect

Re: [patch] libstdc++/64797 fix handling of incomplete multibyte characters

2015-03-05 Thread Jonathan Wakely


On 04/03/15 17:20 +, Jonathan Wakely wrote:

To fix the non-portable 22_locale/conversions/string/2.cc test I
changed it to use char16_t and char32_t where I can reliably create an
invalid sequence that causes a conversion error. That revealed some
more problems in the Unicode conversion utilities, fixed by this patch
and verified by the new tests.

Most of the changes in codecvt.cc are just defining convenience
constants and inline functions, but there are some minor bugs fixed in
UTF-16 error handling too.


[...]

* testsuite/22_locale/conversions/string/2.cc: Use char16_t and
char32_t instead of wchar_t.
* testsuite/22_locale/conversions/string/3.cc: New.


I changed the 22_locale/conversions/string/2.cc and
22_locale/conversions/string/3.cc tests to use UTF-8 as well as UTF-16
and UTF-32 and that revealed another problem in wstring_convert: I
wasn't handling the noconv case. Fixed by this patch.

Tested x86_64-linux, committed to trunk.

commit 1e6eeea711f42aa908e4da8064bc9f4e1859b6bd
Author: Jonathan Wakely 
Date:   Thu Mar 5 12:26:25 2015 +

	* include/bits/locale_conv.h (wstring_convert::_M_conv): Handle
	noconv result.
	* testsuite/22_locale/conversions/string/2.cc: Also test UTF-8.
	* testsuite/22_locale/conversions/string/3.cc: Likewise, and UTF-16.

diff --git a/libstdc++-v3/include/bits/locale_conv.h b/libstdc++-v3/include/bits/locale_conv.h
index b53754d..9b49617 100644
--- a/libstdc++-v3/include/bits/locale_conv.h
+++ b/libstdc++-v3/include/bits/locale_conv.h
@@ -213,6 +213,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  while (__result == codecvt_base::partial && __next != __last
 		 && (__outstr.size() - __outchars) < __maxlen);
 
+	  if (__result == codecvt_base::noconv)
+	{
+	  __outstr.assign(__first, __last);
+	  _M_count = __outstr.size();
+	  return __outstr;
+	}
+
 	  __outstr.resize(__outchars);
 	  _M_count = __next - __first;
 
diff --git a/libstdc++-v3/testsuite/22_locale/conversions/string/2.cc b/libstdc++-v3/testsuite/22_locale/conversions/string/2.cc
index 07d2b52..9341f892 100644
--- a/libstdc++-v3/testsuite/22_locale/conversions/string/2.cc
+++ b/libstdc++-v3/testsuite/22_locale/conversions/string/2.cc
@@ -37,6 +37,24 @@ using std::u32string;
 
 void test01()
 {
+  typedef str_conv sc;
+
+  const sc::byte_string berr = "invalid wide string";
+  const sc::wide_string werr = u8"invalid byte string";
+
+  sc c(berr, werr);
+  string input = "Stop";
+  input += char(0xFF);
+  string woutput = c.from_bytes(input);
+  VERIFY( input == woutput ); // noconv case doesn't detect invalid input
+  string winput = u8"Stop";
+  winput += char(0xFF);
+  string output = c.to_bytes(winput);
+  VERIFY( winput == output ); // noconv case doesn't detect invalid input
+}
+
+void test02()
+{
   typedef str_conv sc;
 
   const sc::byte_string berr = "invalid wide string";
@@ -53,7 +71,7 @@ void test01()
   VERIFY( berr == output );
 }
 
-void test02()
+void test03()
 {
   typedef str_conv sc;
 
@@ -75,4 +93,5 @@ int main()
 {
   test01();
   test02();
+  test03();
 }
diff --git a/libstdc++-v3/testsuite/22_locale/conversions/string/3.cc b/libstdc++-v3/testsuite/22_locale/conversions/string/3.cc
index 7c4ac20..6afa62b 100644
--- a/libstdc++-v3/testsuite/22_locale/conversions/string/3.cc
+++ b/libstdc++-v3/testsuite/22_locale/conversions/string/3.cc
@@ -30,12 +30,55 @@ template
 using str_conv = std::wstring_convert, Elem>;
 
 using std::string;
+using std::u16string;
 using std::u32string;
 
 // test construction with state, for partial conversions
 
 void test01()
 {
+  typedef str_conv wsc;
+
+  wsc c;
+  string input = u8"\u00a3 shillings pence";
+  string woutput = c.from_bytes(input.substr(0, 1));
+  auto partial_state = c.state();
+  auto partial_count = c.converted();
+
+  auto woutput2 = c.from_bytes(u8"state reset on next conversion");
+  VERIFY( woutput2 == u8"state reset on next conversion" );
+
+  wsc c2(new cvt, partial_state);
+  woutput += c2.from_bytes(input.substr(partial_count));
+  VERIFY( u8"\u00a3 shillings pence" == woutput );
+
+  string roundtrip = c2.to_bytes(woutput);
+  VERIFY( input == roundtrip );
+}
+
+void test02()
+{
+  typedef str_conv wsc;
+
+  wsc c;
+  string input = u8"\u00a3 shillings pence";
+  u16string woutput = c.from_bytes(input.substr(0, 1));
+  auto partial_state = c.state();
+  auto partial_count = c.converted();
+
+  auto woutput2 = c.from_bytes(u8"state reset on next conversion");
+  VERIFY( woutput2 == u"state reset on next conversion" );
+
+  wsc c2(new cvt, partial_state);
+  woutput += c2.from_bytes(input.substr(partial_count));
+  VERIFY( u"\u00a3 shillings pence" == woutput );
+
+  string roundtrip = c2.to_bytes(woutput);
+  VERIFY( input == roundtrip );
+}
+
+void test03()
+{
   typedef str_conv wsc;
 
   wsc c;
@@ -44,7 +87,7 @@ void test01()
   auto partial_state = c.state();
   auto partial_count = c.converted();
 
-  auto woutput2 = c.from_bytes("state reset on next conversion"

Re: [PATCH] [ARM] Fix widen-sum pattern in neon.md.

2015-03-05 Thread Kyrill Tkachov



On 05/03/15 13:34, Xingxing Pan wrote:

Hi,

Hi Xingxing,
Thanks for improving this! Some comments inline.



The expanding of widen-sum pattern always fails. The vectorizer expects
the operands to have the same size, while the current implementation of
widen-sum pattern dose not conform to this.

This patch implements the widen-sum pattern with vpadal. Change the
vaddw pattern to anonymous. Add widen-sum test cases for neon.


How has this been tested? Bootstrap and testsuite?


-- Regards, Xingxing

fix-widen-sum.patch


commit 62637f371a3329ff56644526bc5dbf9356cbdd6c
Author: Xingxing Pan
Date:   Wed Feb 25 16:44:25 2015 +0800

 Fix widen-sum pattern in neon.md.

 2015-03-05  Xingxing Pan
 
 config/arm/

 * iterators.md:
 (VWSD): New define_mode_iterator.
 (V_widen_sum_d): New define_mode_attr.
 * neon.md
 (widen_ssum3): Redefined.
 (widen_usum3): Ditto.
 (neon_svaddw3): New anonymous define_insn.
 (neon_uvaddw3): Ditto.


Please use proper ChangeLog format:
* config/arm/iterators.md (VWSD): New.

and so on. Separate ChangeLog for the testsuite.


 testsuite/gcc.target/arm/neon/
 * vect-widen-sum-char2short-s-d.c: New file.
 * vect-widen-sum-char2short-s.c: Ditto.
 * vect-widen-sum-char2short-u-d.c: Ditto.
 * vect-widen-sum-char2short-u.c: Ditto.
 * vect-widen-sum-short2int-s-d.c: Ditto.
 * vect-widen-sum-short2int-s.c: Ditto.
 * vect-widen-sum-short2int-u-d.c: Ditto.
 * vect-widen-sum-short2int-u.c: Ditto.
 testsuite/lib/
 * target-supports.exp:
 (check_effective_target_vect_widen_sum_hi_to_si_pattern): Return 1 for 
ARM NEON.
 (check_effective_target_vect_widen_sum_hi_to_si): Ditto.
 (check_effective_target_vect_widen_sum_qi_to_hi): Ditto.

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index f7f8ab7..4ba5901 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -95,6 +95,9 @@
  ;; Widenable modes.
  (define_mode_iterator VW [V8QI V4HI V2SI])
  
+;; Widenable modes. Used by widen sum.

+(define_mode_iterator VWSD [V8QI V4HI V16QI V8HI])


Two spaces after full stop in comment.


+
  ;; Narrowable modes.
  (define_mode_iterator VN [V8HI V4SI V2DI])
  
@@ -558,6 +561,11 @@

  ;; Widen. Result is half the number of elements, but widened to double-width.
  (define_mode_attr V_unpack   [(V16QI "V8HI") (V8HI "V4SI") (V4SI "V2DI")])
  
+;; Widen. Result is half the number of elements, but widened to double-width.

+;; Used by widen sum.

Likewise.


+(define_mode_attr V_widen_sum_d [(V8QI "V4HI") (V4HI "V2SI")
+ (V16QI "V8HI") (V8HI "V4SI")])
+
  ;; Conditions to be used in extenddi patterns.
  (define_mode_attr qhs_zextenddi_cond [(SI "") (HI "&& arm_arch6") (QI "")])
  (define_mode_attr qhs_sextenddi_cond [(SI "") (HI "&& arm_arch6")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 63c327e..6cac36d 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,7 +1174,31 @@
  
  ;; Widening operations
  
-(define_insn "widen_ssum3"

+(define_expand "widen_usum3"
+ [(match_operand: 0 "s_register_operand" "")
+  (match_operand:VWSD 1 "s_register_operand" "")
+  (match_operand: 2 "s_register_operand" "")]
+  "TARGET_NEON"
+  {
+emit_move_insn(operands[0], operands[2]);
+emit_insn (gen_neon_vpadalu (operands[0], operands[0], operands[1]));
+DONE;
+  }
+)


Is the move from operands[2] to operands[0] necessary?
Can you take advantage of the fact that neon_vpadal has
"0" in it's constraint, thus making register-allocation tie the operands 
to the same register?



+
+(define_expand "widen_ssum3"
+ [(match_operand: 0 "s_register_operand" "")
+  (match_operand:VWSD 1 "s_register_operand" "")
+  (match_operand: 2 "s_register_operand" "")]
+  "TARGET_NEON"
+  {
+emit_move_insn(operands[0], operands[2]);
+emit_insn (gen_neon_vpadals (operands[0], operands[0], operands[1]));
+DONE;
+  }
+)
+
+(define_insn "*neon_svaddw3"
[(set (match_operand: 0 "s_register_operand" "=w")
(plus: (sign_extend:
  (match_operand:VW 1 "s_register_operand" "%w"))
@@ -1184,7 +1208,7 @@
[(set_attr "type" "neon_add_widen")]
  )
  
-(define_insn "widen_usum3"

+(define_insn "*neon_uvaddw3"
[(set (match_operand: 0 "s_register_operand" "=w")
(plus: (zero_extend:
  (match_operand:VW 1 "s_register_operand" "%w"))
diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-s-d.c 
b/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-s-d.c
new file mode 100644
index 000..c81c325
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-s-d.c
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -mvectorize-with-neon-d

[PATCH] Fix PR ipa/65318

2015-03-05 Thread Martin Liška


Hello.

This is patch that prevents merge operation for ICF on variables types which 
are not compatible.
Regression tests were run on x86_64-linux-pc.

Ready for trunk?
Thanks,
Martin
>From b92ec230162b99ff11d4e5688f63ae978e75af12 Mon Sep 17 00:00:00 2001
From: mliska 
Date: Thu, 5 Mar 2015 13:41:07 +0100
Subject: [PATCH] Fix PR ipa/65318.

gcc/ChangeLog:

2015-03-05  Martin Liska  

	PR ipa/65318
	* ipa-icf.c (sem_variable::equals): Compare variable types.

gcc/testsuite/ChangeLog:

2015-03-05  Martin Liska  

	* gcc.dg/ipa/PR65318.c: New test.
---
 gcc/ipa-icf.c  |  5 +
 gcc/testsuite/gcc.dg/ipa/PR65318.c | 18 ++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/PR65318.c

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index c55a09f..1752e67 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -1501,6 +1501,11 @@ sem_variable::equals (sem_item *item,
   if (DECL_INITIAL (item->decl) == error_mark_node && in_lto_p)
 dyn_cast (item->node)->get_constructor ();
 
+  /* As seen in PR ipa/65303 we have to compare variable's types.  */
+  if (!func_checker::compatible_types_p(TREE_TYPE (decl),
+	TREE_TYPE (item->decl)))
+return return_false_with_msg ("variable types are different");
+
   ret = sem_variable::equals (DECL_INITIAL (decl),
 			  DECL_INITIAL (item->node->decl));
   if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.dg/ipa/PR65318.c b/gcc/testsuite/gcc.dg/ipa/PR65318.c
new file mode 100644
index 000..f23b3a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/PR65318.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-icf-details"  } */
+
+static short a = 0;
+short b = -1;
+static unsigned short c = 0;
+
+int
+main ()
+{
+  if (a <= b)
+   return 1;
+
+  return 0;
+}
+
+/* { dg-final { scan-ipa-dump "Equal symbols: 0" "icf"  } } */
+/* { dg-final { cleanup-ipa-dump "icf" } } */
-- 
2.1.2

Fix PR ada/65319

2015-03-05 Thread Eric Botcazou

This removes obsolete code in c-ada-spec.c that valgrind rightfully complains 
about and should fix the PR in the process.

Tested on x86_64-linux-gnu, applied on the mainline and 4.9 branch.


2015-03-05  Eric Botcazou  

PR ada/65319
* c-ada-spec.c (print_destructor): Remove obsolete code.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c
index 007c176..9c633b5 100644
--- a/gcc/c-family/c-ada-spec.c
+++ b/gcc/c-family/c-ada-spec.c
@@ -2665,18 +2665,9 @@ static void
 print_destructor (pretty_printer *buffer, tree t)
 {
   tree decl_name = DECL_NAME (DECL_ORIGIN (t));
-  const char *s = IDENTIFIER_POINTER (decl_name);
 
-  if (*s == '_')
-{
-  for (s += 2; *s != ' '; s++)
-	pp_character (buffer, *s);
-}
-  else
-{
-  pp_string (buffer, "Delete_");
-  pp_ada_tree_identifier (buffer, decl_name, t, false);
-}
+  pp_string (buffer, "Delete_");
+  pp_ada_tree_identifier (buffer, decl_name, t, false);
 }
 
 /* Return the name of type T.  */

Re: [PATCH] [ARM] Fix widen-sum pattern in neon.md.

2015-03-05 Thread James Greenhalgh

Hi Xingxing,

I'm a little confused by your reasons for adding testcases marked XFAIL.

On Thu, Mar 05, 2015 at 01:34:25PM +, Xingxing Pan wrote:
> +/* { dg-final { scan-tree-dump-times "pattern recognized.*w\\\+" 1 "vect" { 
> xfail *-*-* } } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> +/* { dg-final { scan-rtl-dump-times "UNSPEC_VPADAL" 1 "expand" { xfail *-*-* 
> } } } */

Why XFAIL here? Maybe I have not properly understood what you are checking
for, can this not be rewritten in to something we expect to PASS?

If you are testing that the pattern doesn't get recognized, use:

   { dg-final { scan-tree-dump-not "pattern recognized.*w\\\+" "vect" } }

Or is the reason that the pattern should be recognised in future but
currently is not? 

In any case, a comment on why these tests should be expected to fail
would be useful, even if that just means moving the comment you already
have in the testcase up beside these dg-directives.

Thanks,
James

> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> +
> +#include 
> +
> +typedef signed char STYPE1;
> +typedef signed short STYPE2;
> +
> +#define N 128
> +STYPE1 sdata[N];
> +
> +volatile int y = 0;
> +
> +__attribute__ ((noinline)) int
> +ssum ()
> +{
> +  int i;
> +  STYPE2 sum = 0;
> +  STYPE2 check_sum = 0;
> +
> +  /* widenning sum: sum chars into short.
> +
> + Like gcc.dg/vect/vect-reduc-pattern-2c.c, the widening-summation pattern
> + is currently not detected because of this patch:
> +
> + 2005-12-26  Kazu Hirata  
> +PR tree-optimization/25125
> +   */
> +
> +  for (i = 0; i < N; i++)
> +{
> +  sdata[i] = i*2;
> +  check_sum += sdata[i];
> +  /* Avoid vectorization.  */
> +  if (y)
> + abort ();
> +}
> +
> +  /* widenning sum: sum chars into int.  */
> +  for (i = 0; i < N; i++)
> +{
> +  sum += sdata[i];
> +}
> +
> +  /* check results:  */
> +  if (sum != check_sum)
> +abort ();
> +
> +  return 0;
> +}
> +
> +int
> +main (void)
> +{
> +  ssum ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-s.c 
> b/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-s.c
> new file mode 100644
> index 000..de53f5c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-s.c
> @@ -0,0 +1,64 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-options "-O2 -ffast-math -ftree-vectorize -fdump-tree-vect-details 
> -fdump-rtl-expand" } */
> +/* { dg-add-options arm_neon } */
> +
> +/* { dg-final { scan-tree-dump-times "pattern recognized.*w\\\+" 1 "vect" { 
> xfail *-*-* } } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> +/* { dg-final { scan-rtl-dump-times "UNSPEC_VPADAL" 1 "expand" { xfail *-*-* 
> } } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> +
> +#include 
> +
> +typedef signed char STYPE1;
> +typedef signed short STYPE2;
> +
> +#define N 128
> +STYPE1 sdata[N];
> +
> +volatile int y = 0;
> +
> +__attribute__ ((noinline)) int
> +ssum ()
> +{
> +  int i;
> +  STYPE2 sum = 0;
> +  STYPE2 check_sum = 0;
> +
> +  /* widenning sum: sum chars into short.
> +
> + Like gcc.dg/vect/vect-reduc-pattern-2c.c, the widening-summation pattern
> + is currently not detected because of this patch:
> +
> + 2005-12-26  Kazu Hirata  
> +PR tree-optimization/25125
> +   */
> +
> +  for (i = 0; i < N; i++)
> +{
> +  sdata[i] = i*2;
> +  check_sum += sdata[i];
> +  /* Avoid vectorization.  */
> +  if (y)
> + abort ();
> +}
> +
> +  /* widenning sum: sum chars into int.  */
> +  for (i = 0; i < N; i++)
> +{
> +  sum += sdata[i];
> +}
> +
> +  /* check results:  */
> +  if (sum != check_sum)
> +abort ();
> +
> +  return 0;
> +}
> +
> +int
> +main (void)
> +{
> +  ssum ();
> +  return 0;
> +}
> diff --git 
> a/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-u-d.c 
> b/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-u-d.c
> new file mode 100644
> index 000..bfa17d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/neon/vect-widen-sum-char2short-u-d.c
> @@ -0,0 +1,55 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target arm_neon_hw } */
> +/* { dg-options "-O2 -ffast-math -ftree-vectorize 
> -mvectorize-with-neon-double -fdump-tree-vect-details -fdump-rtl-expand" } */
> +/* { dg-add-options arm_neon } */
> +
> +/* { dg-final { scan-tree-dump-times "pattern recognized.*w\\\+" 1 "vect" { 
> target { arm_neon } } } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> +/* { dg-final { scan-rtl-dump-times "UNSPEC_VPADAL" 1 "expand" { target { 
> arm_neon } } } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> +
> +#include 
> +
> +typedef unsigned char UTYPE1;
> +typedef unsigned short UTYPE2;
> +
> +#define N 128
> +UTYPE1 udata[N];
> +
> +volatile int y = 0;
> +
> +__attribute__ ((noinline)) int
> +usum ()
> +{
> +  int i;
> +  UTYPE2 sum = 0;
> +  UTY

Re: [PATCH] Fix PR ipa/65318

2015-03-05 Thread Marek Polacek

On Thu, Mar 05, 2015 at 02:53:44PM +0100, Martin Liška wrote:
> --- a/gcc/ipa-icf.c
> +++ b/gcc/ipa-icf.c
> @@ -1501,6 +1501,11 @@ sem_variable::equals (sem_item *item,
>if (DECL_INITIAL (item->decl) == error_mark_node && in_lto_p)
>  dyn_cast (item->node)->get_constructor ();
>  
> +  /* As seen in PR ipa/65303 we have to compare variable's types.  */

"variables"?

> +  if (!func_checker::compatible_types_p(TREE_TYPE (decl),

Missing space before paren.

> + TREE_TYPE (item->decl)))
> +return return_false_with_msg ("variable types are different");

Here "variables" as well?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/PR65318.c

Why PR* and not pr*, which is common?

Marek

Re: [PATCH] Fix PR ipa/65318

2015-03-05 Thread Martin Liška


On 03/05/2015 03:29 PM, Marek Polacek wrote:

On Thu, Mar 05, 2015 at 02:53:44PM +0100, Martin Liška wrote:

--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -1501,6 +1501,11 @@ sem_variable::equals (sem_item *item,
if (DECL_INITIAL (item->decl) == error_mark_node && in_lto_p)
  dyn_cast (item->node)->get_constructor ();

+  /* As seen in PR ipa/65303 we have to compare variable's types.  */


"variables"?


+  if (!func_checker::compatible_types_p(TREE_TYPE (decl),


Missing space before paren.


+   TREE_TYPE (item->decl)))
+return return_false_with_msg ("variable types are different");


Here "variables" as well?


--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/PR65318.c


Why PR* and not pr*, which is common?

Marek



You are right, pr* is more common.
Other nits are fixed in updated version of patch.

Thanks,
Martin
>From 3f35d9ec57880409cde384bb7b9e8dbaae5231ef Mon Sep 17 00:00:00 2001
From: mliska 
Date: Thu, 5 Mar 2015 13:41:07 +0100
Subject: [PATCH] Fix PR ipa/65318.

gcc/ChangeLog:

2015-03-05  Martin Liska  

	PR ipa/65318
	* ipa-icf.c (sem_variable::equals): Compare variables types.

gcc/testsuite/ChangeLog:

2015-03-05  Martin Liska  

	* gcc.dg/ipa/pr65318.c: New test.
---
 gcc/ipa-icf.c  |  5 +
 gcc/testsuite/gcc.dg/ipa/pr65318.c | 18 ++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr65318.c

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index c55a09f..a7f19d6 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -1501,6 +1501,11 @@ sem_variable::equals (sem_item *item,
   if (DECL_INITIAL (item->decl) == error_mark_node && in_lto_p)
 dyn_cast (item->node)->get_constructor ();
 
+  /* As seen in PR ipa/65303 we have to compare variables types.  */
+  if (!func_checker::compatible_types_p (TREE_TYPE (decl),
+	 TREE_TYPE (item->decl)))
+return return_false_with_msg ("variables types are different");
+
   ret = sem_variable::equals (DECL_INITIAL (decl),
 			  DECL_INITIAL (item->node->decl));
   if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/testsuite/gcc.dg/ipa/pr65318.c b/gcc/testsuite/gcc.dg/ipa/pr65318.c
new file mode 100644
index 000..f23b3a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr65318.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-icf-details"  } */
+
+static short a = 0;
+short b = -1;
+static unsigned short c = 0;
+
+int
+main ()
+{
+  if (a <= b)
+   return 1;
+
+  return 0;
+}
+
+/* { dg-final { scan-ipa-dump "Equal symbols: 0" "icf"  } } */
+/* { dg-final { cleanup-ipa-dump "icf" } } */
-- 
2.1.2

Re: RFC: PATCHES: Properly handle reference to protected data on x86

2015-03-05 Thread H.J. Lu

On Wed, Mar 4, 2015 at 3:26 PM, H.J. Lu  wrote:
> Protected symbol means that it can't be pre-emptied.  It
> doesn't mean its address won't be external.  This is true
> for pointer to protected function.  With copy relocation,
> address of protected data defined in the shared library may
> also be external.  We only know that for sure at run-time.
> Here are patches for glibc, binutils and GCC to handle it
> properly.
>
> Any comments?

This is the binutils patch I checked in.  It basically reverted
the change for

https://sourceware.org/bugzilla/show_bug.cgi?id=15228

on x86.  Copy relocations against protected symbols should
work.

-- 
H.J.
---
bfd/

PR ld/pr15228
PR ld/pr17709
* elf-bfd.h (elf_backend_data): Add extern_protected_data.
* elf32-i386.c (elf_backend_extern_protected_data): New.
Defined to 1.
* elf64-x86-64.c (elf_backend_extern_protected_data): Likewise.
* elflink.c (_bfd_elf_adjust_dynamic_copy): Don't error on
copy relocs against protected symbols if extern_protected_data
is true.
(_bfd_elf_symbol_refs_local_p): Don't return true on protected
non-function symbols if extern_protected_data is true.
* elfxx-target.h (elf_backend_extern_protected_data): New.
Default to 0.
(elfNN_bed): Initialize extern_protected_data with
elf_backend_extern_protected_data.

ld/testsuite/

PR ld/pr15228
PR ld/pr17709
* ld-i386/i386.exp (i386tests): Add a test for PR ld/17709.
* ld-i386/pr17709-nacl.rd: New file.
* ld-i386/pr17709.rd: Likewise.
* ld-i386/pr17709a.s: Likewise.
* ld-i386/pr17709b.s: Likewise.
* ld-i386/protected3.d: Updated.
* ld-i386/protected3.s: Likewise.
* ld-x86-64/pr17709-nacl.rd: New file.
* ld-x86-64/pr17709.rd: Likewise.
* ld-x86-64/pr17709a.s: Likewise.
* ld-x86-64/pr17709b.s: Likewise.
* ld-x86-64/protected3.d: Updated.
* ld-x86-64/protected3.s: Likewise.
* ld-x86-64/x86-64.exp (x86_64tests): Add a test for PR ld/17709.
From ca3fe95e469b9daec153caa2c90665f5daaec2b5 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 5 Mar 2015 06:34:39 -0800
Subject: [PATCH] Add extern_protected_data and set it for x86

With copy relocation, address of protected data defined in the shared
library may be external.  This patch adds extern_protected_data and
changes _bfd_elf_symbol_refs_local_p to return false for protected data
if extern_protected_data is true.

bfd/

	PR ld/pr15228
	PR ld/pr17709
	* elf-bfd.h (elf_backend_data): Add extern_protected_data.
	* elf32-i386.c (elf_backend_extern_protected_data): New.
	Defined to 1.
	* elf64-x86-64.c (elf_backend_extern_protected_data): Likewise.
	* elflink.c (_bfd_elf_adjust_dynamic_copy): Don't error on
	copy relocs against protected symbols if extern_protected_data
	is true.
	(_bfd_elf_symbol_refs_local_p): Don't return true on protected
	non-function symbols if extern_protected_data is true.
	* elfxx-target.h (elf_backend_extern_protected_data): New.
	Default to 0.
	(elfNN_bed): Initialize extern_protected_data with
	elf_backend_extern_protected_data.

ld/testsuite/

	PR ld/pr15228
	PR ld/pr17709
	* ld-i386/i386.exp (i386tests): Add a test for PR ld/17709.
	* ld-i386/pr17709-nacl.rd: New file.
	* ld-i386/pr17709.rd: Likewise.
	* ld-i386/pr17709a.s: Likewise.
	* ld-i386/pr17709b.s: Likewise.
	* ld-i386/protected3.d: Updated.
	* ld-i386/protected3.s: Likewise.
	* ld-x86-64/pr17709-nacl.rd: New file.
	* ld-x86-64/pr17709.rd: Likewise.
	* ld-x86-64/pr17709a.s: Likewise.
	* ld-x86-64/pr17709b.s: Likewise.
	* ld-x86-64/protected3.d: Updated.
	* ld-x86-64/protected3.s: Likewise.
	* ld-x86-64/x86-64.exp (x86_64tests): Add a test for PR ld/17709.
---
 bfd/ChangeLog  | 18 ++
 bfd/elf-bfd.h  |  4 
 bfd/elf32-i386.c   |  1 +
 bfd/elf64-x86-64.c |  1 +
 bfd/elflink.c  |  9 ++---
 bfd/elfxx-target.h |  6 +-
 ld/testsuite/ChangeLog | 19 +++
 ld/testsuite/ld-i386/i386.exp  |  4 
 ld/testsuite/ld-i386/pr17709-nacl.rd   |  4 
 ld/testsuite/ld-i386/pr17709.rd|  4 
 ld/testsuite/ld-i386/pr17709a.s|  8 
 ld/testsuite/ld-i386/pr17709b.s|  5 +
 ld/testsuite/ld-i386/protected3.d  |  3 ++-
 ld/testsuite/ld-i386/protected3.s  |  3 ++-
 ld/testsuite/ld-x86-64/pr17709-nacl.rd |  4 
 ld/testsuite/ld-x86-64/pr17709.rd  |  4 
 ld/testsuite/ld-x86-64/pr17709a.s  |  8 
 ld/testsuite/ld-x86-64/pr17709b.s  |  5 +
 ld/testsuite/ld-x86-64/protected3.d|  3 ++-
 ld/testsuite/ld-x86-64/protected3.s|  3 ++-
 ld/testsuite/ld-x86-64/x86-64.exp  |  4 
 21 files changed, 112 insertions(+), 8 deletions(-)
 create mode 100644 ld/testsuite/ld-i386/pr17709-nacl.rd
 create mode 100644 ld/testsuite/ld-i386/pr17709.rd
 create mode 100644 ld/testsuite/ld-i386/pr17709a.s
 create mode 100644 ld/testsuite/ld-i386/pr17709b.s
 create mode 100644 ld/testsuite/ld-x86-64/pr17709-nacl.rd
 create mode 100644 ld/testsu

[PATCH][AArch64] Fix Cortex-A53 shift costs

2015-03-05 Thread Wilco Dijkstra

This patch fixes the shift costs for Cortex-A53 so they are more accurate - 
immediate shifts use
SBFM/UBFM which takes 2 cycles, register controlled shifts take 1 cycle. 
Bootstrap and regression
OK.

ChangeLog: 
2015-03-05  Wilco Dijkstra  

* gcc/config/arm/aarch-cost-tables.h (cortexa53_extra_costs):
Make Cortex-A53 shift costs more accurate.

---
 gcc/config/arm/aarch-cost-tables.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/aarch-cost-tables.h 
b/gcc/config/arm/aarch-cost-tables.h
index 05e96a9..6bb8ede 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -130,12 +130,12 @@ const struct cpu_cost_table cortexa53_extra_costs =
 0, /* arith.  */
 0, /* logical.  */
 COSTS_N_INSNS (1), /* shift.  */
-COSTS_N_INSNS (2), /* shift_reg.  */
+0, /* shift_reg.  */
 COSTS_N_INSNS (1), /* arith_shift.  */
-COSTS_N_INSNS (2), /* arith_shift_reg.  */
+COSTS_N_INSNS (1), /* arith_shift_reg.  */
 COSTS_N_INSNS (1), /* log_shift.  */
-COSTS_N_INSNS (2), /* log_shift_reg.  */
-0, /* extend.  */
+COSTS_N_INSNS (1), /* log_shift_reg.  */
+COSTS_N_INSNS (1), /* extend.  */
 COSTS_N_INSNS (1), /* extend_arith.  */
 COSTS_N_INSNS (1), /* bfi.  */
 COSTS_N_INSNS (1), /* bfx.  */
-- 
1.9.1

Re: [PATCH] PR rtl-optimization/32219: optimizer causees wrong code in pic/hidden/weak symbol checking

2015-03-05 Thread Alex Velenko

On 03/03/15 15:58, Alex Velenko wrote:

On 19/02/15 17:26, Richard Henderson wrote:

On 02/19/2015 09:08 AM, Alex Velenko wrote:
Your suggestion seem to fix gcc.target/arm/long-calls-1.c, but has 
to be

thoroughly tested.

Before you do complete testing, please also delete the TREE_STATIC test.
That bit should never be relevant to functions, as it indicates not that
it is in the compilation unit, but that it has static (as opposed to
automatic) storage duration.  Thus it is only relevant to variables.

r~

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7bf5b4d..777230e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6392,14 +6392,8 @@ arm_set_default_type_attributes (tree type)
  static bool
  arm_function_in_section_p (tree decl, section *section)
  {
-  /* We can only be certain about functions defined in the same
- compilation unit.  */
-  if (!TREE_STATIC (decl))
-return false;
-
-  /* Make sure that SYMBOL always binds to the definition in this
- compilation unit.  */
-  if (!targetm.binds_local_p (decl))
+  /* We can only be certain about the prevailing symbol definition.  */
+  if (!decl_binds_to_current_def_p (decl))
  return false;

/* If DECL_SECTION_NAME is set, assume it is trustworthy. */

Hi,

Did a bootstrap and a full regression run on arm-none-linux-gnueabihf,
No new regressions found. Some previously failing tests in libstdc++ 
started to fail differently, for example:

< ERROR: 22_locale/num_get/get/wchar_t/2.cc: can't read 
"additional_sources": no such variable for " dg-do 22 run { xfail 
lax_strtof\

p } "
< UNRESOLVED: 22_locale/num_get/get/wchar_t/2.cc: can't read 
"additional_sources": no such variable for " dg-do 22 run { xfail lax_s\

trtofp } "
---
> ERROR: 22_locale/num_get/get/wchar_t/2.cc: can't read 
"et_cache(uclibc,value)": no such element in array for " dg-do 22 run 
{ xfai\

l lax_strtofp } "
> UNRESOLVED: 22_locale/num_get/get/wchar_t/2.cc: can't read 
"et_cache(uclibc,value)": no such element in array for " dg-do 22 run {\

 xfail lax_strtofp } "

But I think it is okay.

Kind regards,
Alex

Hi,
Ping. Could someone, please approve Richard's patch?
This issue needs fixing.
Kind regards,
Alex

[committed] Fix ubsan test

2015-03-05 Thread Marek Polacek

This test should be run rather than just compiled.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-03-05  Marek Polacek  

* c-c++-common/ubsan/bounds-6.c: Use dg-do run.

diff --git gcc/testsuite/c-c++-common/ubsan/bounds-6.c 
gcc/testsuite/c-c++-common/ubsan/bounds-6.c
index e7d15d5..aef2055 100644
--- gcc/testsuite/c-c++-common/ubsan/bounds-6.c
+++ gcc/testsuite/c-c++-common/ubsan/bounds-6.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-options "-fsanitize=bounds -Wall -Wextra" } */
 
 /* Test off-by-one.  */

Marek

RE: [PATCH] Fix another wrong-code bug with -fstrict-volatile-bitfields

2015-03-05 Thread Bernd Edlinger

Hi,

On Thu, 5 Mar 2015 12:24:56, Richard Biener wrote:
>
> On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger
>  wrote:
>> Hi,
>>
>> On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote:
>>>
>>> On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger
>> Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure.
>>
>> Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode)
>> Because the strict volatile bitfields handling will inevitably try to use
>> the fieldmode to access the memory.
>>
>> Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode),
>> because it is easier to explain when some one asks, when we guarantee the 
>> semantics
>> of strict volatile bitfield?
>
> But on non-strict-align targets you can even for 1-byte aligned MEMs
> access an SImode field directly. So the old code looks correct to me
> here and the fix needs to be done somewhere else.
>

But this SImode access is split up in several QImode or HImode accesses,
in the processor's execution pipeline, finally on an external bus like AXI all 
memory
transactions are aligned.  The difference is just that some processors can 
split up
the unaligned accesses and some need help from the compiler, but even on an
x86 we have a different semantics for unaligned acceses, that is they are no 
longer atomic,
while an aligned access is always executed as an atomic transaction on an x86 
processor.

>> Probably there is already something in the logic in expr.c that prevents 
>> these cases,
>> because otherwise it would be way to easy to find an example for unaligned 
>> accesses
>> to unaligned memory on STRICT_ALIGNMENT targets.
>>
>>
>> Ok, what would you think about this variant?
>>
>> --- expmed.c.jj 2015-01-16 11:20:40.0 +0100
>> +++ expmed.c 2015-03-05 11:50:09.40016 +0100
>> @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns
>> return false;
>>
>> /* Check for cases of unaligned fields that must be split. */
>> - if (bitnum % BITS_PER_UNIT + bitsize> modesize
>> - || (STRICT_ALIGNMENT
>> - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize))
>> + if (bitnum % modesize + bitsize> modesize)
>> + return false;
>> +
>> + /* Check that the memory is sufficiently aligned. */
>> + if (MEM_ALIGN (op0) < modesize)
>
> I think that only applies to strict-align targets and then only for
> GET_MODE_ALIGNMENT (modesize). And of course what matters
> is the alignment at bitnum - even though op0 may be not sufficiently
> aligned it may have known misalignment so that op0 + bitnum is
> sufficiently aligned.
>
> Testcases would need to annotate structs with packed/aligned attributes
> to get at these cases.
>
> For the testcase included in the patch, what does the patch end up doing?
> Not going the strict-volatile bitfield expansion path? That looks unnecessary
> on !strict-alignment targets but resonable on strict-align targets where the
> access would need to be splitted. So, why does it end up being splitted
> on !strict-align targets?
>

gcc -fstrict-volatile-bitfields -S 20150304-1.c
without patch we find this in 20150304-1.s:

main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movzbl  global+1(%rip), %eax
    orl $-1, %eax
    movb    %al, global+1(%rip)
    movzbl  global+2(%rip), %eax
    orl $-1, %eax
    movb    %al, global+2(%rip)
    movzbl  global+3(%rip), %eax
    orl $-1, %eax
    movb    %al, global+3(%rip)
    movzbl  global+4(%rip), %eax
    orl $127, %eax
    movb    %al, global+4(%rip)
    movl    global(%rip), %eax
    shrl    $8, %eax
    andl    $2147483647, %eax
    cmpl    $2147483647, %eax
    je  .L2
    call    abort
.L2:
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc


so the write path is OK, because strict_volatile_bitfield_p returns false,
because

  /* Check for cases where the C++ memory model applies.  */
  if (bitregion_end != 0
  && (bitnum - bitnum % modesize < bitregion_start
  || bitnum - bitnum % modesize + modesize - 1> bitregion_end))
    return false;


bitregion_start=8, bitregion_end=39
bitnum - bitnum % modesize = 0
bitnum - bitnum % modesize + modesize - 1 = 31


this does not happen in the read path, and the problem is the access
here does not work:

  str_rtx = narrow_bit_field_mem (str_rtx, fieldmode, bitsize, bitnum,
  &bitnum);
  /* Explicitly override the C/C++ memory model; ignore the
 bit range so that we can do the access in the mode mandated
 by -fstrict-volatile-bitfields instead.  */
  store_fixed_bit_field_1 (str_rtx, bitsize, bitnum, value);


str_rtx = unchanged, bitnum = 8, bitsize= 31, but store_fixed_bit_fileld_1
can not handle that.

BTW: I can

Re: [PATCH] PR rtl-optimization/32219: optimizer causees wrong code in pic/hidden/weak symbol checking

2015-03-05 Thread Ramana Radhakrishnan




diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7bf5b4d..777230e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6392,14 +6392,8 @@ arm_set_default_type_attributes (tree type)
  static bool
  arm_function_in_section_p (tree decl, section *section)
  {
-  /* We can only be certain about functions defined in the same
- compilation unit.  */
-  if (!TREE_STATIC (decl))
-return false;
-
-  /* Make sure that SYMBOL always binds to the definition in this
- compilation unit.  */
-  if (!targetm.binds_local_p (decl))
+  /* We can only be certain about the prevailing symbol definition.  */
+  if (!decl_binds_to_current_def_p (decl))
  return false;

/* If DECL_SECTION_NAME is set, assume it is trustworthy. */





Sorry to have missed this - I've also been traveling recently which has 
made it harder with patch traffic - this is OK if no regressions.


Please apply with an appropriate Changelog.

regressions
Ramana

[PATCH, committed] jit documentation fixes

2015-03-05 Thread David Malcolm

On Thu, 2015-03-05 at 07:37 +0100, Bert Wesarg wrote:
> Hi David,
> 
> while reading the very good tutorial at
> 
> https://gcc.gnu.org/onlinedocs/jit/intro/tutorial03.html
> 
> I noticed that the calls to  gcc_jit_block_end_with_conditional()
> misses the on_true and on_false parameters.

Good catch, thanks!

This also affected the corresponding docs for the C++ bindings, and I
noticed a few other issues whilst re-reading the docs, so I've
committed the following patch to trunk (as r221218).

Dave

gcc/jit/ChangeLog:
* docs/cp/intro/tutorial03.rst: Add missing arguments to
gccjit::block::end_with_conditional call.  Add on_true/on_false
comments.  Tweak the wording.
* docs/intro/tutorial03.rst: Add missing arguments to
gcc_jit_block_end_with_conditional call.  Add some clarifying
comments.
* docs/topics/compilation.rst: Tweak the wording to avoid an
ambiguous use of "this".
* docs/topics/contexts.rst: Fix a typo.
* docs/topics/expressions.rst (GCC_JIT_BINARY_OP_MINUS): Remove
a stray backtick.
---
 gcc/jit/docs/cp/intro/tutorial03.rst | 10 ++
 gcc/jit/docs/intro/tutorial03.rst| 12 +++-
 gcc/jit/docs/topics/compilation.rst  |  2 +-
 gcc/jit/docs/topics/contexts.rst |  2 +-
 gcc/jit/docs/topics/expressions.rst  |  2 +-
 5 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/gcc/jit/docs/cp/intro/tutorial03.rst 
b/gcc/jit/docs/cp/intro/tutorial03.rst
index aac781d..f4405ad 100644
--- a/gcc/jit/docs/cp/intro/tutorial03.rst
+++ b/gcc/jit/docs/cp/intro/tutorial03.rst
@@ -238,7 +238,9 @@ and can then use this to add `b_loop_cond`'s sole 
statement, via
 
 .. code-block:: c++
 
-  b_loop_cond.end_with_conditional (guard);
+  b_loop_cond.end_with_conditional (guard,
+b_after_loop, // on_true
+b_loop_body); // on_false
 
 However :type:`gccjit::rvalue` has overloaded operators for this, so we
 express the conditional as
@@ -247,14 +249,14 @@ express the conditional as
 
gccjit::rvalue guard = (i >= n);
 
-and hence write the block more concisely as:
+and hence we can write the block more concisely as:
 
 .. code-block:: c++
 
   b_loop_cond.end_with_conditional (
 i >= n,
-b_after_loop,
-b_loop_body);
+b_after_loop, // on_true
+b_loop_body); // on_false
 
 Next, we populate the body of the loop.
 
diff --git a/gcc/jit/docs/intro/tutorial03.rst 
b/gcc/jit/docs/intro/tutorial03.rst
index cd7136a..6c1ca3e 100644
--- a/gcc/jit/docs/intro/tutorial03.rst
+++ b/gcc/jit/docs/intro/tutorial03.rst
@@ -229,6 +229,7 @@ We build the comparison using 
:c:func:`gcc_jit_context_new_comparison`:
 
 .. code-block:: c
 
+  /* (i >= n) */
gcc_jit_rvalue *guard =
  gcc_jit_context_new_comparison (
ctxt, NULL,
@@ -241,7 +242,16 @@ and can then use this to add `b_loop_cond`'s sole 
statement, via
 
 .. code-block:: c
 
-  gcc_jit_block_end_with_conditional (b_loop_cond, NULL, guard);
+  /* Equivalent to:
+   if (guard)
+ goto after_loop;
+   else
+ goto loop_body;  */
+  gcc_jit_block_end_with_conditional (
+b_loop_cond, NULL,
+guard,
+b_after_loop, /* on_true */
+b_loop_body); /* on_false */
 
 Next, we populate the body of the loop.
 
diff --git a/gcc/jit/docs/topics/compilation.rst 
b/gcc/jit/docs/topics/compilation.rst
index 708d009..4eddf76 100644
--- a/gcc/jit/docs/topics/compilation.rst
+++ b/gcc/jit/docs/topics/compilation.rst
@@ -37,7 +37,7 @@ In-memory compilation
This calls into GCC and builds the code, returning a
`gcc_jit_result *`.
 
-   If this is non-NULL, the caller becomes responsible for
+   If the result is non-NULL, the caller becomes responsible for
calling :func:`gcc_jit_result_release` on it once they're done
with it.
 
diff --git a/gcc/jit/docs/topics/contexts.rst b/gcc/jit/docs/topics/contexts.rst
index 46a08bd..b7f281a 100644
--- a/gcc/jit/docs/topics/contexts.rst
+++ b/gcc/jit/docs/topics/contexts.rst
@@ -138,7 +138,7 @@ be responsible for all of the rest:
If no errors occurred, this will be NULL.
 
 If you are wrapping the C API for a higher-level language that supports
-exception-handling, you may instead by interested in the last error that
+exception-handling, you may instead be interested in the last error that
 occurred on the context, so that you can embed this in an exception:
 
 .. function:: const char *\
diff --git a/gcc/jit/docs/topics/expressions.rst 
b/gcc/jit/docs/topics/expressions.rst
index 1cedb66..49317b9 100644
--- a/gcc/jit/docs/topics/expressions.rst
+++ b/gcc/jit/docs/topics/expressions.rst
@@ -233,7 +233,7 @@ Binary Operation  C equivalent
 
For pointer addition, use :c:func:`gcc_jit_context_new_array_access`.
 
-.. c:macro:: GCC_JIT_BINARY_OP_MINUS`
+.. c:macro:: GCC_JIT_BINARY_OP_MINUS
 
Subtraction of arithmetic values; analogous to:
 
-- 
1.8.5.3

Re: [PATCH] Fix another wrong-code bug with -fstrict-volatile-bitfields

2015-03-05 Thread Richard Biener

On Thu, Mar 5, 2015 at 4:05 PM, Bernd Edlinger
 wrote:
> Hi,
>
> On Thu, 5 Mar 2015 12:24:56, Richard Biener wrote:
>>
>> On Thu, Mar 5, 2015 at 12:00 PM, Bernd Edlinger
>>  wrote:
>>> Hi,
>>>
>>> On Thu, 5 Mar 2015 10:00:26, Richard Biener wrote:

 On Wed, Mar 4, 2015 at 3:13 PM, Bernd Edlinger
>>> Maybe one thing is missing from strict_volatile_bitfield_p, I am not sure.
>>>
>>> Maybe it should check that MEM_ALIGN (op0)>= GET_MODE_ALIGNMENT (fieldmode)
>>> Because the strict volatile bitfields handling will inevitably try to use
>>> the fieldmode to access the memory.
>>>
>>> Or would it be better to say MEM_ALIGN (op0)>= GET_MODE_BITSIZE (fieldmode),
>>> because it is easier to explain when some one asks, when we guarantee the 
>>> semantics
>>> of strict volatile bitfield?
>>
>> But on non-strict-align targets you can even for 1-byte aligned MEMs
>> access an SImode field directly. So the old code looks correct to me
>> here and the fix needs to be done somewhere else.
>>
>
> But this SImode access is split up in several QImode or HImode accesses,
> in the processor's execution pipeline, finally on an external bus like AXI 
> all memory
> transactions are aligned.  The difference is just that some processors can 
> split up
> the unaligned accesses and some need help from the compiler, but even on an
> x86 we have a different semantics for unaligned acceses, that is they are no 
> longer atomic,
> while an aligned access is always executed as an atomic transaction on an x86 
> processor.
>
>>> Probably there is already something in the logic in expr.c that prevents 
>>> these cases,
>>> because otherwise it would be way to easy to find an example for unaligned 
>>> accesses
>>> to unaligned memory on STRICT_ALIGNMENT targets.
>>>
>>>
>>> Ok, what would you think about this variant?
>>>
>>> --- expmed.c.jj 2015-01-16 11:20:40.0 +0100
>>> +++ expmed.c 2015-03-05 11:50:09.40016 +0100
>>> @@ -472,9 +472,11 @@ strict_volatile_bitfield_p (rtx op0, uns
>>> return false;
>>>
>>> /* Check for cases of unaligned fields that must be split. */
>>> - if (bitnum % BITS_PER_UNIT + bitsize> modesize
>>> - || (STRICT_ALIGNMENT
>>> - && bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize> modesize))
>>> + if (bitnum % modesize + bitsize> modesize)
>>> + return false;
>>> +
>>> + /* Check that the memory is sufficiently aligned. */
>>> + if (MEM_ALIGN (op0) < modesize)
>>
>> I think that only applies to strict-align targets and then only for
>> GET_MODE_ALIGNMENT (modesize). And of course what matters
>> is the alignment at bitnum - even though op0 may be not sufficiently
>> aligned it may have known misalignment so that op0 + bitnum is
>> sufficiently aligned.
>>
>> Testcases would need to annotate structs with packed/aligned attributes
>> to get at these cases.
>>
>> For the testcase included in the patch, what does the patch end up doing?
>> Not going the strict-volatile bitfield expansion path? That looks unnecessary
>> on !strict-alignment targets but resonable on strict-align targets where the
>> access would need to be splitted. So, why does it end up being splitted
>> on !strict-align targets?
>>
>
> gcc -fstrict-volatile-bitfields -S 20150304-1.c
> without patch we find this in 20150304-1.s:
>
> main:
> .LFB0:
> .cfi_startproc
> pushq   %rbp
> .cfi_def_cfa_offset 16
> .cfi_offset 6, -16
> movq%rsp, %rbp
> .cfi_def_cfa_register 6
> movzbl  global+1(%rip), %eax
> orl $-1, %eax
> movb%al, global+1(%rip)
> movzbl  global+2(%rip), %eax
> orl $-1, %eax
> movb%al, global+2(%rip)
> movzbl  global+3(%rip), %eax
> orl $-1, %eax
> movb%al, global+3(%rip)
> movzbl  global+4(%rip), %eax
> orl $127, %eax
> movb%al, global+4(%rip)
> movlglobal(%rip), %eax
> shrl$8, %eax
> andl$2147483647, %eax
> cmpl$2147483647, %eax
> je  .L2
> callabort
> .L2:
> movl$0, %eax
> popq%rbp
> .cfi_def_cfa 7, 8
> ret
> .cfi_endproc
>
>
> so the write path is OK, because strict_volatile_bitfield_p returns false,
> because
>
>   /* Check for cases where the C++ memory model applies.  */
>   if (bitregion_end != 0
>   && (bitnum - bitnum % modesize < bitregion_start
>   || bitnum - bitnum % modesize + modesize - 1> bitregion_end))
> return false;
>
>
> bitregion_start=8, bitregion_end=39
> bitnum - bitnum % modesize = 0
> bitnum - bitnum % modesize + modesize - 1 = 31
>
>
> this does not happen in the read path, and the problem is the access
> here does not work:
>
>   str_rtx = narrow_bit_field_mem (str_rtx, fieldmode, bitsize, bitnum,
>   &bitnum);
>   /* Explicitly override the C/C++ memory model; ignore the
>  bit range so that we can do the access in

[PATCH] Improve memory usage on PR64928

2015-03-05 Thread Richard Biener


I am currently testing the following patch to reduce peak memory
usage of the out-of-SSA phase for the testcase in the PR.  The
issue is (as usual) big live and SSA conflict graph memory use.
This side tackles live info and frees livein before computing
the conflict graph (which only needs liveout).

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

Richard.

2015-03-05  Richard Biener  

PR middle-end/64928
* tree-ssa-live.h (struct tree_live_info_d): Add livein_obstack
and liveout_obstack members.
(calculate_live_on_exit): Remove.
(delete_tree_live_info_livein): Declare.
(delete_tree_live_info_liveout): Likewise.
* tree-ssa-live.c (liveness_bitmap_obstack): Remove global var.
(new_tree_live_info): Adjust.
(delete_tree_live_info_livein): New function.
(delete_tree_live_info_liveout): Likewise.
(delete_tree_live_info): Deal with partly deleted live info.
(loe_visit_block): Remove temporary bitmap by using
bitmap_ior_and_compl_into.
(live_worklist): Adjust accordingly.
(calculate_live_on_exit): Make static.
(calculate_live_ranges): Do not initialize liveness_bitmap_obstack.

Index: gcc/tree-ssa-coalesce.c
===
--- gcc/tree-ssa-coalesce.c (revision 221205)
+++ gcc/tree-ssa-coalesce.c (working copy)
@@ -1345,6 +1345,7 @@ coalesce_ssa_name (void)
 dump_var_map (dump_file, map);
 
   liveinfo = calculate_live_ranges (map);
+  delete_tree_live_info_livein (liveinfo);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 dump_live_info (dump_file, liveinfo, LIVEDUMP_ENTRY);
Index: gcc/tree-ssa-live.c
===
--- gcc/tree-ssa-live.c (revision 221205)
+++ gcc/tree-ssa-live.c (working copy)
@@ -973,13 +973,6 @@ remove_unused_locals (void)
   timevar_pop (TV_REMOVE_UNUSED);
 }
 
-/* Obstack for globale liveness info bitmaps.  We don't want to put these
-   on the default obstack because these bitmaps can grow quite large and
-   we'll hold on to all that memory until the end of the compiler run.
-   As a bonus, delete_tree_live_info can destroy all the bitmaps by just
-   releasing the whole obstack.  */
-static bitmap_obstack liveness_bitmap_obstack;
-
 /* Allocate and return a new live range information object base on MAP.  */
 
 static tree_live_info_p
@@ -992,31 +985,61 @@ new_tree_live_info (var_map map)
   live->map = map;
   live->num_blocks = last_basic_block_for_fn (cfun);
 
+  bitmap_obstack_initialize (&live->livein_obstack);
+  bitmap_obstack_initialize (&live->liveout_obstack);
   live->livein = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
   FOR_EACH_BB_FN (bb, cfun)
-bitmap_initialize (&live->livein[bb->index], &liveness_bitmap_obstack);
+bitmap_initialize (&live->livein[bb->index], &live->livein_obstack);
 
   live->liveout = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
   FOR_EACH_BB_FN (bb, cfun)
-bitmap_initialize (&live->liveout[bb->index], &liveness_bitmap_obstack);
+bitmap_initialize (&live->liveout[bb->index], &live->liveout_obstack);
 
   live->work_stack = XNEWVEC (int, last_basic_block_for_fn (cfun));
   live->stack_top = live->work_stack;
 
-  live->global = BITMAP_ALLOC (&liveness_bitmap_obstack);
+  live->global = BITMAP_ALLOC (NULL);
   return live;
 }
 
 
+/* Free storage for livein of the live range info object LIVE.  */
+
+void
+delete_tree_live_info_livein (tree_live_info_p live)
+{
+  bitmap_obstack_release (&live->livein_obstack);
+  free (live->livein);
+  live->livein = NULL;
+}
+
+/* Free storage for liveout of the live range info object LIVE.  */
+
+void
+delete_tree_live_info_liveout (tree_live_info_p live)
+{
+  bitmap_obstack_release (&live->liveout_obstack);
+  free (live->liveout);
+  live->liveout = NULL;
+}
+
 /* Free storage for live range info object LIVE.  */
 
 void
 delete_tree_live_info (tree_live_info_p live)
 {
-  bitmap_obstack_release (&liveness_bitmap_obstack);
+  if (live->livein)
+{
+  bitmap_obstack_release (&live->livein_obstack);
+  free (live->livein);
+}
+  if (live->liveout)
+{
+  bitmap_obstack_release (&live->liveout_obstack);
+  free (live->liveout);
+}
+  BITMAP_FREE (live->global);
   free (live->work_stack);
-  free (live->liveout);
-  free (live->livein);
   free (live);
 }
 
@@ -1027,8 +1050,7 @@ delete_tree_live_info (tree_live_info_p
it each time.  */
 
 static void
-loe_visit_block (tree_live_info_p live, basic_block bb, sbitmap visited,
-bitmap tmp)
+loe_visit_block (tree_live_info_p live, basic_block bb, sbitmap visited)
 {
   edge e;
   bool change;
@@ -1046,17 +1068,17 @@ loe_visit_block (tree_live_info_p live,
   pred_bb = e->src;
   if (pred_bb == ENTRY_BLOCK_PTR_FOR_FN (cfun))
continue;
-  /* TMP is variables live-on-entry from BB that aren't defined in the
+

Re: [patch/committed] PR middle-end/65233 make walk-ssa_copies handle empty PHIs

2015-03-05 Thread Jeff Law


On 03/05/15 01:47, Richard Biener wrote:

On Thu, Mar 5, 2015 at 1:54 AM, Jan Hubicka  wrote:


It gets passed the valueize callback now which returns NULL_TREE for
SSA names we can't follow.


Btw, for match-and-simplify I had to use that as default for fold_stmt
_exactly_ because of the call to fold_stmt from replace_uses_by
via merge-blocks from cfgcleanup.  This is because replace-uses-by
doesn't have all uses replaced before it folds the stmt!

We also have the "weaker" in-place flag.

2015-03-04  Richard Biener  

 PR middle-end/65233
 * ipa-polymorphic-call.c: Include tree-ssa-operands.h and
 tree-into-ssa.h.
 (walk_ssa_copies): Revert last chage.  Instead do not walk
 SSA names registered for SSA update.


Maybe include the patch?  It should not be problem to make the function
to valuelize everything it looks into.


I attached it.

Well, I think for stage1 the fix is to not call fold_stmt from CFG hooks or
CFG cleanup.  Merge-blocks can just demote PHIs to assignments and
leave propagation to followup cleanups (we can of course propagate
virtual operands).
Seems reasonable.  Though I'd also like to see us look to narrow the 
window in which things are in this odd state.  It's just asking for long 
term maintenance headaches.


Removal of unreachable blocks so that we can compute dominators should, 
in theory, never need to look at this stuff and that's a much smaller 
window than all of tree_cleanup_cfg.



Along the same lines I want to tackle the ssa name manager issues that 
are loosely related.


Jeff

Re: [Patch] PR 65315 - Fix alignment of local variables

2015-03-05 Thread Jeff Law


On 03/04/15 12:50, Steve Ellcey  wrote:

While examining some MIPS code I noticed that GCC did not seem to be
fully honoring the aligned attribute on some local variables.  I submitted
PR middle-end/65315 to record the bug and I think I now understand it and
have a fix.  The problem was that expand_stack_vars seemed to think that
the first entry in stack_vars_sorted would have the largest alignment but
while all the variables that had alignment greater then
MAX_SUPPORTED_STACK_ALIGNMENT would come before all variables whose
alignment was less than MAX_SUPPORTED_STACK_ALIGNMENT, within the variables
with the alignment greater than MAX_SUPPORTED_STACK_ALIGNMENT, they
were sorted by size, not by alignment.

So my fix was to update large_align in expand_stack_vars if needed.

I have verified the fix on the MIPS test case in PR 65315 and am doing a
regression test now.  OK to checkin if there are no regressions?

I wasn't sure how to create a generic test case, I was checking the
alignment on MIPS by hand by looking for the shift-right/shift-left
instructions that create an aligned pointer but that obviously doesn't
work on other architectures.

Steve Ellcey
sell...@imgtec.com


2015-03-04  Steve Ellcey  

PR middle-end/65315
* cfgexpand.c (expand_stack_vars): Update large_align to maximum
needed alignment.

Please include a testcase for the testsuite if at all possible.

OK for the trunk.

jeff

Re: [Patch] PR 65315 - Fix alignment of local variables

2015-03-05 Thread Steve Ellcey

On Thu, 2015-03-05 at 09:36 +0100, Richard Biener wrote:

> >
> > I have verified the fix on the MIPS test case in PR 65315 and am doing a
> > regression test now.  OK to checkin if there are no regressions?
> 
> It looks like large_align vars are dynamically allocated and thus they
> should be sorted as sizeof (void *) I suppose.
> 
> Do you have a testcase?
> 
> Ok.
> 
> Thanks,
> Richard.
> 
> Richard.

Here is a test case that I used on MIPS.  The 'b' and 'c' arrays both
have alignments greater than MAX_SUPPORTED_STACK_ALIGNMENT but because
'b' is larger in size than 'c' it gets sorted earlier and its alignment
is used to align the dynamically allocated memory instead of 'c' which
has a greater alignment requirement.

Steve Ellcey
sell...@imgtec.com


int foo(int *x)
{
int i,y;
int a[40];
int b[50] __attribute__ ((aligned(32)));
int c[40] __attribute__ ((aligned(128)));

for (i = 0; i < 40; i++)
a[i] = *x++;
for (i = 0; i < 40; i++)
b[i] = *x++;
for (i = 0; i < 40; i++)
c[i] = *x++;
y = -99;
for (i = 0; i < 40; i++)
y = y + a[i] + b[i] + c[i];
return y;
}

Re: [PATCH][ARM] Remove an unused reload hook.

2015-03-05 Thread Matthew Wahab


On 27/02/15 09:41, Richard Earnshaw wrote:

On 19/02/15 12:19, Matthew Wahab wrote:

The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since the
ARM backend no longer supports reload, this macro is not needed and this
patch removes it.

gcc/
2015-02-19  Matthew Wahab  

 * config/arm/arm.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
 (ARM_LEGITIMIZE_RELOAD_ADDRESS): Remove.
 (THUMB_LEGITIMIZE_RELOAD_ADDRESS): Remove.
 * config/arm/arm.c (arm_legitimize_reload_address): Remove.
 (thumb_legitimize_reload_address): Remove.
 * config/arm/arm-protos.h (arm_legitimize_reload_address):
 Remove.
 (thumb_legitimize_reload_address): Remove.



This is OK for stage 1.

I have one open question: can LRA generate the optimizations that these
hooks used to provide through reload?  If not, please could you file
some bugzilla reports so that we don't lose them.

Thanks,
R.


arm_legitimize_reload_address was added by 
https://gcc.gnu.org/ml/gcc-patches/2011-04/msg00605.html. From 
config/arm/arm.c, the optimization turns

 add t1, r2, #4096
 ldr r0, [t1, #4]
 add t2, r2, #4096
 ldr r1, [t2, #8]
into
 add t1, r2, #4096
 ldr r0, [t1, #4]
 ldr r1, [t1, #8]

As far as I can tell, LRA does do this. Compiling the following with -O1:

int bar(int, int, int);
int test1(int* buf)
{
  int a = buf[41000];
  int b = buf[41004];
  int c = buf[41008];
  bar(a, b, c);
  return a +  b + c;
}

gcc version 4.5.1 (Sourcery G++ Lite 2010.09-51), which predates the 
optimization, produces

ldr r3, .L2
ldr r4, [r0, r3]
add r3, r3, #16
ldr r5, [r0, r3]
add r3, r3, #16
ldr r6, [r0, r3]

gcc version 4.9.3 20141119 with and without -mno-lra produce
add r0, r0, #163840
ldr r4, [r0, #160]
ldr r6, [r0, #176]
ldr r5, [r0, #192]
so it looks the better sequence gets generated.

thumb_legitimize_reload_address was added by 
https://gcc.gnu.org/ml/gcc-patches/2005-08/msg01140.html to fix PR 
23436. It replaces sequences like

mov r3, r9
mov r2, r10
ldr r0, [r3, r2]
with
mov r3, r9
add r3, r3, r10
ldr r0, [r3]

This looks like it's missing from trunk so I'll open a bugzilla report 
for it.


It's quite possible that I've got this all wrong so if I've missed 
something or you'd like me to open a bugzilla report for the ARM 
optimization as well, let me know.


Matthew

Re: [PATCH] Fix PR ipa/65318

2015-03-05 Thread Jeff Law


On 03/05/15 07:37, Martin Liška wrote:



 From 3f35d9ec57880409cde384bb7b9e8dbaae5231ef Mon Sep 17 00:00:00 2001
From: mliska
Date: Thu, 5 Mar 2015 13:41:07 +0100
Subject: [PATCH] Fix PR ipa/65318.

gcc/ChangeLog:

2015-03-05  Martin Liska

PR ipa/65318
* ipa-icf.c (sem_variable::equals): Compare variables types.

gcc/testsuite/ChangeLog:

2015-03-05  Martin Liska

* gcc.dg/ipa/pr65318.c: New test.

OK.
Jeff

[PATCH] Add new target h8300-*-linux

2015-03-05 Thread Yoshinori Sato

Add h8300-*-linux target for h8300 linux kernel and userland.

h8300-*-elf is some difference of standard elf.
h8300-*-linux is compatible of standard elf rules.

Thanks.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index cfacea1..fc5101c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-03-06  Yoshinori Sato 
+
+   * config.gcc: Add h8300-*-linux
+   * config/h8300/h8300.c (h8300_option_override):
+   Exclusive -mh vs -ms/-msx
+   (h8300_file_start): Target priority -msx > -ms > -mh
+   * config/h8300/linux.h: New file.
+   * config/h8300/t-linux: Likewise.
+
 2015-03-05  Martin Liska  
 
* ipa-inline.c (inline_small_functions): Set default value to
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 483c672..975f3f6 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1211,6 +1211,10 @@ h8300-*-elf*)
tmake_file="h8300/t-h8300"
tm_file="h8300/h8300.h dbxelf.h elfos.h newlib-stdint.h h8300/elf.h"
;;
+h8300-*-linux*)
+   tmake_file="${tmake_file} h8300/t-h8300 h8300/t-linux"
+   tm_file="h8300/h8300.h dbxelf.h elfos.h gnu-user.h linux.h 
glibc-stdint.h h8300/linux.h"
+   ;;
 hppa*64*-*-linux*)
target_cpu_default="MASK_PA_11|MASK_PA_20"
tm_file="pa/pa64-start.h ${tm_file} dbxelf.h elfos.h gnu-user.h linux.h 
\
diff --git a/gcc/config/h8300/h8300.c b/gcc/config/h8300/h8300.c
index 4e9110e..9862b7e 100644
--- a/gcc/config/h8300/h8300.c
+++ b/gcc/config/h8300/h8300.c
@@ -370,6 +370,11 @@ h8300_option_override (void)
   h8_pop_op = h8_pop_ops[cpu_type];
   h8_mov_op = h8_mov_ops[cpu_type];
 
+  if (TARGET_H8300H && (TARGET_H8300S || TARGET_H8300SX))
+{
+  target_flags ^= MASK_H8300H;
+}
+
   if (!TARGET_H8300S && TARGET_MAC)
 {
   error ("-ms2600 is used without -ms");
@@ -1006,12 +1011,12 @@ h8300_file_start (void)
 {
   default_file_start ();
 
-  if (TARGET_H8300H)
-fputs (TARGET_NORMAL_MODE ? "\t.h8300hn\n" : "\t.h8300h\n", asm_out_file);
-  else if (TARGET_H8300SX)
+  if (TARGET_H8300SX)
 fputs (TARGET_NORMAL_MODE ? "\t.h8300sxn\n" : "\t.h8300sx\n", 
asm_out_file);
   else if (TARGET_H8300S)
 fputs (TARGET_NORMAL_MODE ? "\t.h8300sn\n" : "\t.h8300s\n", asm_out_file);
+  else if (TARGET_H8300H)
+fputs (TARGET_NORMAL_MODE ? "\t.h8300hn\n" : "\t.h8300h\n", asm_out_file);
 }
 
 /* Output assembly language code for the end of file.  */
diff --git a/gcc/config/h8300/linux.h b/gcc/config/h8300/linux.h
new file mode 100644
index 000..995d320
--- /dev/null
+++ b/gcc/config/h8300/linux.h
@@ -0,0 +1,47 @@
+/* Definitions of target machine for GNU compiler.
+   Renesas H8/300 (linux variant)
+   Copyright (C) 2015
+   Free Software Foundation, Inc.
+   Contributed by Yoshinori Sato 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_H8300_LINUX_H
+#define GCC_H8300_LINUX_H
+
+#define TARGET_OS_CPP_BUILTINS() \
+  do   \
+{  \
+  GNU_USER_TARGET_OS_CPP_BUILTINS();   \
+}  \
+  while (0)
+
+#undef LINK_SPEC
+#define LINK_SPEC "%{mh:%{!mn:-m h8300helf_linux}} %{ms:%{!mn:-m 
h8300self_linux}}"
+
+#undef TARGET_DEFAULT
+#define TARGET_DEFAULT (MASK_QUICKCALL | MASK_INT32 | MASK_H8300H)
+
+/* Width of a word, in units (bytes).  */
+#undef DOUBLE_TYPE_SIZE
+#define DOUBLE_TYPE_SIZE   64
+
+#undef DEFAULT_SIGNED_CHAR
+#define DEFAULT_SIGNED_CHAR 1
+
+#undef USER_LABEL_PREFIX
+#endif /* ! GCC_H8300_LINUX_H */
diff --git a/gcc/config/h8300/t-linux b/gcc/config/h8300/t-linux
new file mode 100644
index 000..11237ea
--- /dev/null
+++ b/gcc/config/h8300/t-linux
@@ -0,0 +1,20 @@
+# Copyright (C) 2015 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+#

New German PO file for 'gcc' (version 5.1-b20150208)

2015-03-05 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-5.1-b20150208.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] Fix PR ipa/65318

2015-03-05 Thread Jan Hubicka

> gcc/ChangeLog:
> 
> 2015-03-05  Martin Liska  
> 
>   PR ipa/65318
>   * ipa-icf.c (sem_variable::equals): Compare variables types.
> 
> gcc/testsuite/ChangeLog:
> 
> 2015-03-05  Martin Liska  
> 
>   * gcc.dg/ipa/pr65318.c: New test.

OK,
Honza

Re: [PATCH] Fix PR ipa/65318

2015-03-05 Thread Jan Hubicka

> > gcc/ChangeLog:
> > 
> > 2015-03-05  Martin Liska  
> > 
> > PR ipa/65318
> > * ipa-icf.c (sem_variable::equals): Compare variables types.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 2015-03-05  Martin Liska  
> > 
> > * gcc.dg/ipa/pr65318.c: New test.
> 
> OK,
Though actually I think it is papering over folding issue - probably we do want
to imply VIEW_CONVERT_EXPR when type of alias and type of variable does not 
match.

Honza
> Honza

Re: RFC: PATCHES: Properly handle reference to protected data on x86

2015-03-05 Thread Rich Felker

On Thu, Mar 05, 2015 at 06:39:10AM -0800, H.J. Lu wrote:
> On Wed, Mar 4, 2015 at 3:26 PM, H.J. Lu  wrote:
> > Protected symbol means that it can't be pre-emptied.  It
> > doesn't mean its address won't be external.  This is true
> > for pointer to protected function.  With copy relocation,
> > address of protected data defined in the shared library may
> > also be external.  We only know that for sure at run-time.
> > Here are patches for glibc, binutils and GCC to handle it
> > properly.
> >
> > Any comments?
> 
> This is the binutils patch I checked in.  It basically reverted
> the change for
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=15228
> 
> on x86.  Copy relocations against protected symbols should
> work.

Does it actually work now though? Last I checked gcc was generating
wrong code too -- GOT-relative accesses rather than accessing them
through the GOT. If that's the case, ld has no way to fix the problem.

Rich

Re: [PATCH][ARM] Remove an unused reload hook.

2015-03-05 Thread Matthew Wahab


On 05/03/15 16:34, Matthew Wahab wrote:


thumb_legitimize_reload_address was added by
https://gcc.gnu.org/ml/gcc-patches/2005-08/msg01140.html to fix PR
23436. It replaces sequences like
mov r3, r9
mov r2, r10
ldr r0, [r3, r2]
with
mov r3, r9
add r3, r3, r10
ldr r0, [r3]

This looks like it's missing from trunk so I'll open a bugzilla report
for it.


PR 65326 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65326).
Matthew

Re: RFC: PATCHES: Properly handle reference to protected data on x86

2015-03-05 Thread H.J. Lu

On Thu, Mar 5, 2015 at 9:32 AM, Rich Felker  wrote:
> On Thu, Mar 05, 2015 at 06:39:10AM -0800, H.J. Lu wrote:
>> On Wed, Mar 4, 2015 at 3:26 PM, H.J. Lu  wrote:
>> > Protected symbol means that it can't be pre-emptied.  It
>> > doesn't mean its address won't be external.  This is true
>> > for pointer to protected function.  With copy relocation,
>> > address of protected data defined in the shared library may
>> > also be external.  We only know that for sure at run-time.
>> > Here are patches for glibc, binutils and GCC to handle it
>> > properly.
>> >
>> > Any comments?
>>
>> This is the binutils patch I checked in.  It basically reverted
>> the change for
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=15228
>>
>> on x86.  Copy relocations against protected symbols should
>> work.
>
> Does it actually work now though? Last I checked gcc was generating
> wrong code too -- GOT-relative accesses rather than accessing them
> through the GOT. If that's the case, ld has no way to fix the problem.
>

You need to apply both my GCC and glibc patches in

https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00257.html

-- 
H.J.

Re: [PATCH] Fix PR ipa/65318

2015-03-05 Thread Jan Hubicka

Hi,
this patch sovles the incorrect folding. The very same unification (ignoring
signedness by checking that memory representation is the same) is done by
constant pool.

Some of the other uses of ctor_for_folding therefore already uses
VIEW_CONVERT_EXPR, I suppose as a partial fix for past bugs. This particular
case is handled by get_symbol_constant_value that does not VCE. Maybe
we usually don't drop scalar constant to constant pool that often, so this
did not show up.

Attached is non-ICF testcase. It is bit questionable if we consider this to be
valid, but it is better to be safe than sorry.  Mixing signed/unsigned may be
more common with LTO.

Bootstrap/regtest running in x86_64-linux, seems sane?

Honza

static short a __attribute__ ((alias ("c")));
short b = -1;
static unsigned short c = 0;

int
main ()
{
  if (a <= b)
return 1;
  return 0;
}


* gimple-fold.c (get_symbol_constant_value): Convert to symbol type.

Index: gimple-fold.c
===
--- gimple-fold.c   (revision 221170)
+++ gimple-fold.c   (working copy)
@@ -263,7 +263,16 @@ get_symbol_constant_value (tree sym)
{
  val = canonicalize_constructor_val (unshare_expr (val), sym);
  if (val && is_gimple_min_invariant (val))
-   return val;
+   {
+  if (!useless_type_conversion_p (TREE_TYPE (sym), TREE_TYPE 
(val)))
+   {
+ if (operand_equal_p (TYPE_SIZE (TREE_TYPE (sym)),
+  TYPE_SIZE (TREE_TYPE (val)), 0))
+   return NULL_TREE;
+ val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (sym), val);
+   }
+ return val;
+   }
  else
return NULL_TREE;
}

Re: [PATCH] Fix PR ipa/65318

2015-03-05 Thread Richard Biener

On March 5, 2015 7:08:16 PM CET, Jan Hubicka  wrote:
>Hi,
>this patch sovles the incorrect folding. The very same unification
>(ignoring
>signedness by checking that memory representation is the same) is done
>by
>constant pool.
>
>Some of the other uses of ctor_for_folding therefore already uses
>VIEW_CONVERT_EXPR, I suppose as a partial fix for past bugs. This
>particular
>case is handled by get_symbol_constant_value that does not VCE. Maybe
>we usually don't drop scalar constant to constant pool that often, so
>this
>did not show up.
>
>Attached is non-ICF testcase. It is bit questionable if we consider
>this to be
>valid, but it is better to be safe than sorry.  Mixing signed/unsigned
>may be
>more common with LTO.

With LTO we wrap all memory accesses in MEM_REFs during streaming which 
preserves the original types (and thus act as view-conversion if necessary). 
Without LTO the aliases should provide the same.

Richard.

>Bootstrap/regtest running in x86_64-linux, seems sane?
>
>Honza
>
>static short a __attribute__ ((alias ("c")));
>short b = -1;
>static unsigned short c = 0;
>
>int
>main ()
>{
>  if (a <= b)
>return 1;
>  return 0;
>}
>
>
>   * gimple-fold.c (get_symbol_constant_value): Convert to symbol type.
>
>Index: gimple-fold.c
>===
>--- gimple-fold.c  (revision 221170)
>+++ gimple-fold.c  (working copy)
>@@ -263,7 +263,16 @@ get_symbol_constant_value (tree sym)
>   {
> val = canonicalize_constructor_val (unshare_expr (val), sym);
> if (val && is_gimple_min_invariant (val))
>-  return val;
>+  {
>+  if (!useless_type_conversion_p (TREE_TYPE (sym),
>TREE_TYPE (val)))
>+  {
>+if (operand_equal_p (TYPE_SIZE (TREE_TYPE (sym)),
>+ TYPE_SIZE (TREE_TYPE (val)), 0))
>+  return NULL_TREE;
>+val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (sym), val);
>+  }
>+return val;
>+  }
> else
>   return NULL_TREE;
>   }

Re: [PATCH] Fix PR ipa/65318

2015-03-05 Thread Richard Biener

On March 5, 2015 7:08:16 PM CET, Jan Hubicka  wrote:
>Hi,
>this patch sovles the incorrect folding. The very same unification
>(ignoring
>signedness by checking that memory representation is the same) is done
>by
>constant pool.
>
>Some of the other uses of ctor_for_folding therefore already uses
>VIEW_CONVERT_EXPR, I suppose as a partial fix for past bugs. This
>particular
>case is handled by get_symbol_constant_value that does not VCE. Maybe
>we usually don't drop scalar constant to constant pool that often, so
>this
>did not show up.
>
>Attached is non-ICF testcase. It is bit questionable if we consider
>this to be
>valid, but it is better to be safe than sorry.  Mixing signed/unsigned
>may be
>more common with LTO.
>
>Bootstrap/regtest running in x86_64-linux, seems sane?
>
>Honza
>
>static short a __attribute__ ((alias ("c")));
>short b = -1;
>static unsigned short c = 0;
>
>int
>main ()
>{
>  if (a <= b)
>return 1;
>  return 0;
>}
>
>
>   * gimple-fold.c (get_symbol_constant_value): Convert to symbol type.
>
>Index: gimple-fold.c
>===
>--- gimple-fold.c  (revision 221170)
>+++ gimple-fold.c  (working copy)
>@@ -263,7 +263,16 @@ get_symbol_constant_value (tree sym)
>   {
> val = canonicalize_constructor_val (unshare_expr (val), sym);
> if (val && is_gimple_min_invariant (val))
>-  return val;
>+  {
>+  if (!useless_type_conversion_p (TREE_TYPE (sym),
>TREE_TYPE (val)))
>+  {
>+if (operand_equal_p (TYPE_SIZE (TREE_TYPE (sym)),
>+ TYPE_SIZE (TREE_TYPE (val)), 0))
>+  return NULL_TREE;

And no, I don't think this is sane.  Callers need to handle mismatches IIRC.

>+val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (sym), val);
>+  }
>+return val;
>+  }
> else
>   return NULL_TREE;
>   }

Re: [PATCH] Fix PR ipa/65318

2015-03-05 Thread Jan Hubicka

> >Index: gimple-fold.c
> >===
> >--- gimple-fold.c(revision 221170)
> >+++ gimple-fold.c(working copy)
> >@@ -263,7 +263,16 @@ get_symbol_constant_value (tree sym)
> > {
> >   val = canonicalize_constructor_val (unshare_expr (val), sym);
> >   if (val && is_gimple_min_invariant (val))
> >-return val;
> >+{
> >+  if (!useless_type_conversion_p (TREE_TYPE (sym),
> >TREE_TYPE (val)))
> >+{
> >+  if (operand_equal_p (TYPE_SIZE (TREE_TYPE (sym)),
> >+   TYPE_SIZE (TREE_TYPE (val)), 0))
> >+return NULL_TREE;
> 
> And no, I don't think this is sane.  Callers need to handle mismatches IIRC.

OK, I am little bit confused about your MEM_REF suggestion. So you mean that
MEM_REF should be added around all references to symbol that is an alias?
Where it is done?  Is there a reason why we do not add MEM_REF always?  I would
like to keep optimization passes (like ipa-visibility or ICF) to turn symbol
into an alias without having to update underlying IL.

Concerning callers handling mismatches, the VIEW_CONVERT_EXPR thing seems valid
thing to do for all uses except for fold_const_aggregate_ref_1. So perhaps
we can just inline rest of get_symbol_constant_value in there and document that
get_symbol_constant_value returns value in correct type.

Or am I missing something obvious?

Thanks!
Honza
> 
> >+  val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (sym), val);
> >+}
> >+  return val;
> >+}
> >   else
> > return NULL_TREE;
> > }
>

Re: [PR58315] reset inlined debug vars at return-to point

2015-03-05 Thread Alexandre Oliva

On Mar  4, 2015, Richard Biener  wrote:

> Compile-time was slightly faster with the patch, 45s vs. 47s,
> but the machine wasn't completely un-loaded.  var-tracking parts:

> unpatched:

>  variable tracking   :   0.63 ( 1%) usr   0.03 ( 1%) sys   0.82 (
> 2%) wall   28641 kB ( 2%) ggc
>  var-tracking dataflow   :   3.72 ( 8%) usr   0.04 ( 1%) sys   3.65 (
> 7%) wall1337 kB ( 0%) ggc
>  var-tracking emit   :   2.63 ( 6%) usr   0.02 ( 1%) sys   2.55 (
> 5%) wall  148835 kB ( 8%) ggc

> patched:

>  variable tracking   :   0.64 ( 1%) usr   0.01 ( 0%) sys   0.72 (
> 1%) wall   24202 kB ( 1%) ggc
>  var-tracking dataflow   :   1.96 ( 4%) usr   0.01 ( 0%) sys   1.94 (
> 4%) wall1326 kB ( 0%) ggc
>  var-tracking emit   :   1.46 ( 3%) usr   0.02 ( 0%) sys   1.49 (
> 3%) wall   46980 kB ( 3%) ggc

> we have at the point of RTL expansion 56% more debug statements
> (988231 lines with # DEBUG in the .optimized dump out of
> 1212518 lines in total vs. 630666 out of 854953).  So we go from
> roughly 1 debug stmt per real stmt to 1.5 debug stmts per real stmt.

So, if I got this right, all these extra debug stmts and insns had no
effect whatsoever on compile time proper.  The reduction in compile time
can be entirely accounted for by the time they save in the var-tracking
parts, and any compile time increase they bring about in other passes is
negligible.

Does this match your assessment of the impact of the patch?

> we have two pairs of DEBUG stmts for dD.570173 here, binding
> to _25 and then immediately resetting.

They're at different lines, and associated with different statements, so
once we have stmt frontiers support in GCC and GDB, you will be able to
stop between them an inspect the state, regardless of the absence of
executable code between them.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

patch for PR64342

2015-03-05 Thread Vladimir Makarov

  The following patch fixes a bad code generation for avx512f-kandnw-1.c
reported in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64342

  The patch was bootstrapped and tested on x86-64.

  Committed as rev. 221223.

2015-03-05  Vladimir Makarov  

PR target/64342
* lra-assigns.c (find_hard_regno_for): Rename to
find_hard_regno_for_1.  Add a new parameter.
(find_hard_regno_for): New function using find_hard_regno_for_1.

Index: lra-assigns.c
===
--- lra-assigns.c	(revision 220916)
+++ lra-assigns.c	(working copy)
@@ -491,10 +491,13 @@ adjust_hard_regno_cost (int hard_regno,
pseudos in complicated situations where pseudo sizes are different.
 
If TRY_ONLY_HARD_REGNO >= 0, consider only that hard register,
-   otherwise consider all hard registers in REGNO's class.  */
+   otherwise consider all hard registers in REGNO's class.
+
+   If REGNO_SET is not empty, only hard registers from the set are
+   considered.  */
 static int
-find_hard_regno_for (int regno, int *cost, int try_only_hard_regno,
-		 bool first_p)
+find_hard_regno_for_1 (int regno, int *cost, int try_only_hard_regno,
+		   bool first_p, HARD_REG_SET regno_set)
 {
   HARD_REG_SET conflict_set;
   int best_cost = INT_MAX, best_priority = INT_MIN, best_usage = INT_MAX;
@@ -509,7 +512,13 @@ find_hard_regno_for (int regno, int *cos
   bool *rclass_intersect_p;
   HARD_REG_SET impossible_start_hard_regs, available_regs;
 
-  COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs);
+  if (hard_reg_set_empty_p (regno_set))
+COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs);
+  else
+{
+  COMPL_HARD_REG_SET (conflict_set, regno_set);
+  IOR_HARD_REG_SET (conflict_set, lra_no_alloc_regs);
+}
   rclass = regno_allocno_class_array[regno];
   rclass_intersect_p = ira_reg_classes_intersect_p[rclass];
   curr_hard_regno_costs_check++;
@@ -680,6 +689,33 @@ find_hard_regno_for (int regno, int *cos
   return best_hard_regno;
 }
 
+/* A wrapper for find_hard_regno_for_1 (see comments for that function
+   description).  This function tries to find a hard register for
+   preferred class first if it is worth.  */
+static int
+find_hard_regno_for (int regno, int *cost, int try_only_hard_regno, bool first_p)
+{
+  int hard_regno;
+  HARD_REG_SET regno_set;
+
+  /* Only original pseudos can have a different preferred class.  */
+  if (try_only_hard_regno < 0 && regno < lra_new_regno_start)
+{
+  enum reg_class pref_class = reg_preferred_class (regno);
+  
+  if (regno_allocno_class_array[regno] != pref_class)
+	{
+	  hard_regno = find_hard_regno_for_1 (regno, cost, -1, first_p,
+	  reg_class_contents[pref_class]);
+	  if (hard_regno >= 0)
+	return hard_regno;
+	}
+}
+  CLEAR_HARD_REG_SET (regno_set);
+  return find_hard_regno_for_1 (regno, cost, try_only_hard_regno, first_p,
+regno_set);
+}
+
 /* Current value used for checking elements in
update_hard_regno_preference_check.	*/
 static int curr_update_hard_regno_preference_check;

Re: [PR58315] reset inlined debug vars at return-to point

2015-03-05 Thread Richard Biener

On March 5, 2015 8:26:42 PM CET, Alexandre Oliva  wrote:
>On Mar  4, 2015, Richard Biener  wrote:
>
>> Compile-time was slightly faster with the patch, 45s vs. 47s,
>> but the machine wasn't completely un-loaded.  var-tracking parts:
>
>> unpatched:
>
>>  variable tracking   :   0.63 ( 1%) usr   0.03 ( 1%) sys   0.82 (
>> 2%) wall   28641 kB ( 2%) ggc
>>  var-tracking dataflow   :   3.72 ( 8%) usr   0.04 ( 1%) sys   3.65 (
>> 7%) wall1337 kB ( 0%) ggc
>>  var-tracking emit   :   2.63 ( 6%) usr   0.02 ( 1%) sys   2.55 (
>> 5%) wall  148835 kB ( 8%) ggc
>
>> patched:
>
>>  variable tracking   :   0.64 ( 1%) usr   0.01 ( 0%) sys   0.72 (
>> 1%) wall   24202 kB ( 1%) ggc
>>  var-tracking dataflow   :   1.96 ( 4%) usr   0.01 ( 0%) sys   1.94 (
>> 4%) wall1326 kB ( 0%) ggc
>>  var-tracking emit   :   1.46 ( 3%) usr   0.02 ( 0%) sys   1.49 (
>> 3%) wall   46980 kB ( 3%) ggc
>
>> we have at the point of RTL expansion 56% more debug statements
>> (988231 lines with # DEBUG in the .optimized dump out of
>> 1212518 lines in total vs. 630666 out of 854953).  So we go from
>> roughly 1 debug stmt per real stmt to 1.5 debug stmts per real stmt.
>
>So, if I got this right, all these extra debug stmts and insns had no
>effect whatsoever on compile time proper.  The reduction in compile
>time
>can be entirely accounted for by the time they save in the var-tracking
>parts, and any compile time increase they bring about in other passes
>is
>negligible.
>
>Does this match your assessment of the impact of the patch?

For the effect on tramp3d, yes.

The positive effect on var-tracking compile time really looks good.  So I'm 
tempted to approve the patch for 5.0.

>> we have two pairs of DEBUG stmts for dD.570173 here, binding
>> to _25 and then immediately resetting.
>
>They're at different lines, and associated with different statements,
>so
>once we have stmt frontiers support in GCC and GDB, you will be able to
>stop between them an inspect the state, regardless of the absence of
>executable code between them.

I wonder why we use the same decl uid for two different inline instances 
though. Do we not remap them during inlining?

Richard.

[PATCH] Fix ubsan's flexible array member handling (PR sanitizer/65280)

2015-03-05 Thread Marek Polacek

ubsan's code to determine whether we're dealing with a flexible array member
didn't check for a COMPONENT_REF, but it should, since flexible array members
can only occur in a structure.  Consequently, we didn't instrument stuff we
should instrument.

Bootstrapped/regtested on x86_64-linux, if I hear no objections, I'll apply
this tomorrow.

2015-03-05  Marek Polacek  
Martin Uecker  

PR sanitizer/65280
* doc/invoke.texi: Update description of -fsanitize=bounds.

* c-ubsan.c (ubsan_instrument_bounds): Check for COMPONENT_REF
before trying to figure out whether we have a flexible array member.

* c-c++-common/ubsan/bounds-1.c: Add testing of flexible array
member-like arrays.
* c-c++-common/ubsan/bounds-8.c: New test.
* c-c++-common/ubsan/bounds-9.c: New test.
* gcc.dg/ubsan/bounds-2.c: New test.

diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c
index 90d59c0..a14426f 100644
--- gcc/c-family/c-ubsan.c
+++ gcc/c-family/c-ubsan.c
@@ -303,8 +303,9 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
 
   /* Detect flexible array members and suchlike.  */
   tree base = get_base_address (array);
-  if (base && (TREE_CODE (base) == INDIRECT_REF
-  || TREE_CODE (base) == MEM_REF))
+  if (TREE_CODE (array) == COMPONENT_REF
+  && base && (TREE_CODE (base) == INDIRECT_REF
+ || TREE_CODE (base) == MEM_REF))
 {
   tree next = NULL_TREE;
   tree cref = array;
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 006a852..67814d4 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -5704,8 +5704,8 @@ a++;
 @item -fsanitize=bounds
 @opindex fsanitize=bounds
 This option enables instrumentation of array bounds.  Various out of bounds
-accesses are detected.  Flexible array members and initializers of variables
-with static storage are not instrumented.
+accesses are detected.  Flexible array members, flexible array member-like
+arrays, and initializers of variables with static storage are not instrumented.
 
 @item -fsanitize=alignment
 @opindex fsanitize=alignment
diff --git gcc/testsuite/c-c++-common/ubsan/bounds-1.c 
gcc/testsuite/c-c++-common/ubsan/bounds-1.c
index 20e390f..5014f6f 100644
--- gcc/testsuite/c-c++-common/ubsan/bounds-1.c
+++ gcc/testsuite/c-c++-common/ubsan/bounds-1.c
@@ -6,6 +6,7 @@
 struct S { int a[10]; };
 struct T { int l; int a[]; };
 struct U { int l; int a[0]; };
+struct V { int l; int a[1]; };
 
 __attribute__ ((noinline, noclone))
 void
@@ -64,9 +65,14 @@ main (void)
   struct T *t = (struct T *) __builtin_malloc (sizeof (struct T) + 10);
   t->a[1] = 1;
 
+  /* Don't instrument zero-sized arrays (GNU extension).  */
   struct U *u = (struct U *) __builtin_malloc (sizeof (struct U) + 10);
   u->a[1] = 1;
 
+  /* Don't instrument last array in a struct.  */
+  struct V *v = (struct V *) __builtin_malloc (sizeof (struct V) + 10);
+  v->a[1] = 1;
+
   long int *d[10][5];
   d[9][0] = (long int *) 0;
   d[8][3] = d[9][0];
diff --git gcc/testsuite/c-c++-common/ubsan/bounds-8.c 
gcc/testsuite/c-c++-common/ubsan/bounds-8.c
index e69de29..a43b480 100644
--- gcc/testsuite/c-c++-common/ubsan/bounds-8.c
+++ gcc/testsuite/c-c++-common/ubsan/bounds-8.c
@@ -0,0 +1,13 @@
+/* PR sanitizer/65280 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+
+int
+main (void)
+{
+  int *t = (int *) __builtin_malloc (sizeof (int) * 10);
+  int (*a)[1] = (int (*)[1]) t;
+  (*a)[2] = 1;
+}
+
+/* { dg-output "index 2 out of bounds for type 'int 
\\\[1\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
diff --git gcc/testsuite/c-c++-common/ubsan/bounds-9.c 
gcc/testsuite/c-c++-common/ubsan/bounds-9.c
index e69de29..61c11f4 100644
--- gcc/testsuite/c-c++-common/ubsan/bounds-9.c
+++ gcc/testsuite/c-c++-common/ubsan/bounds-9.c
@@ -0,0 +1,24 @@
+/* PR sanitizer/65280 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* Origin: Martin Uecker  */
+
+void
+foo (volatile int (*a)[3])
+{
+  (*a)[3] = 1; // error
+  a[0][0] = 1; // ok
+  a[1][0] = 1; // ok
+  a[1][4] = 1; // error
+}
+
+int
+main ()
+{
+  volatile int a[20];
+  foo ((int (*)[3]) &a);
+  return 0;
+}
+
+/* { dg-output "index 3 out of bounds for type 'int 
\\\[3\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 4 out of bounds for type 'int \\\[3\\\]'" } */
diff --git gcc/testsuite/gcc.dg/ubsan/bounds-2.c 
gcc/testsuite/gcc.dg/ubsan/bounds-2.c
index e69de29..3e88035 100644
--- gcc/testsuite/gcc.dg/ubsan/bounds-2.c
+++ gcc/testsuite/gcc.dg/ubsan/bounds-2.c
@@ -0,0 +1,18 @@
+/* PR sanitizer/65280 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+
+void
+foo (int n, int (*b)[n])
+{
+  (*b)[n] = 1;
+}
+
+int
+main ()
+{
+  int a[20];
+  foo (3, (int (*)[3]) &a);
+}
+
+/* { dg-output "index 3 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */

Marek

Re: [PATCH] target/65286 - Disable multilib for ppc64le

2015-03-05 Thread Alan Modra

This arranges to build a powerpc64le-linux compiler without -m32
support by default.  Bootstrapped and regression tested on Ubuntu
powerpc64le-linux without --disable-multilib, and on powerpc64-linux
and powerpc-linux.  OK for mainline and branches?

This part of the config.gcc patch does most of the work
-   case ${maybe_biarch}:${enable_targets}:${cpu_is_64bit} in
-   always:* | yes:*powerpc64* | yes:all:* | yes:*:yes)
+   case ${target}:${enable_targets}:${maybe_biarch} in
+   powerpc64-* | powerpc-*:*:yes | *:*powerpc64-*:yes | *:all:yes \
+   | powerpc64le*:*powerpcle* | powerpc64le*:*powerpc-* \
+   | powerpcle-*:*powerpc64le*:yes)

always:* becomes powerpc64-*, ie. exclude powerpc64le
yes:*powerpc64* becomes *:*powerpc64-*:yes, excluding powerpc64le so
that --target=powerpc64le-linux --enable-targets=powerpc64le-linux
doesn't accidentally get you a biarch compiler.
yes:all:* becomes *:all:yes, more or less unchanged.
yes:*:yes becomes powerpc-*:*:yes allowing --target=powerpc-linux
--with-cpu= to continue to build a biarch ppc64
compiler.

Some other notes:
t-fprules setting of MULTILIB variables is in every case overridden by
a following t-file, except for the commented out powerpc-*-openbsd*.
Since the aim of this patch is to build powerpc64le without multilibs,
the default setting of these vars needs to go.  t-ppcos needs to be
removed from powerpc64le configurations for the same reason.  Oh, and
adding t-fprules and t-ppcos before previous additions to tmake_file
is no longer necessary.  I checked all the other t-files that might be
added for interactions.

The linux64.h change is so that passing -m32 results in
error: -m32 not supported in the configuration
rather than the confusing
error: -m64 requires a PowerPC64 cpu
(Yes, I know using TARGET_64BIT_P would be nicer, but it's probably
better left to a cleanup patch.)

PR target/65286
* config.gcc (powerpc*-*-linux*): Arrange for powerpc64le-linux
to be single-arch by default.  Set cpu_is_64bit for powerpc64
given --with-cpu=native.
* config/rs6000/t-fprules: Do not set default MULTILIB vars.
* config/rs6000/t-linux (MULTIARCH_DIRNAME): Support powerpc64
and powerpc64le.
* config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Test
rs6000_isa_flags rather than TARGET_64BIT.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 221164)
+++ gcc/config.gcc  (working copy)
@@ -2337,28 +2337,32 @@
 powerpc*-*-linux*)
tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h freebsd-spec.h 
rs6000/sysv4.h"
extra_options="${extra_options} rs6000/sysv4.opt"
-   tmake_file="rs6000/t-fprules rs6000/t-ppcos ${tmake_file} 
rs6000/t-ppccomm"
+   tmake_file="${tmake_file} rs6000/t-fprules rs6000/t-ppccomm"
extra_objs="$extra_objs rs6000-linux.o"
case ${target} in
powerpc*le-*-*)
tm_file="${tm_file} rs6000/sysv4le.h" ;;
esac
-   maybe_biarch=yes
+   case ${target}:${with_cpu} in
+   powerpc64*: | powerpc64*:native) cpu_is_64bit=yes ;;
+   esac
+   maybe_biarch=${cpu_is_64bit}
+   case ${enable_targets} in
+   *powerpc64*) maybe_biarch=yes ;;
+   esac
case ${target} in
powerpc64*-*-linux*spe* | powerpc64*-*-linux*paired*)
-   echo "*** Configuration ${target} not supported" 1>&2
+   echo "*** Configuration ${target} not supported" 1>&2
exit 1
;;
powerpc*-*-linux*spe* | powerpc*-*-linux*paired*)
maybe_biarch=
;;
-   powerpc64*-*-linux*)
-   test x$with_cpu != x || cpu_is_64bit=yes
-   maybe_biarch=always
-   ;;
esac
-   case ${maybe_biarch}:${enable_targets}:${cpu_is_64bit} in
-   always:* | yes:*powerpc64* | yes:all:* | yes:*:yes)
+   case ${target}:${enable_targets}:${maybe_biarch} in
+   powerpc64-* | powerpc-*:*:yes | *:*powerpc64-*:yes | *:all:yes \
+   | powerpc64le*:*powerpcle* | powerpc64le*:*powerpc-* \
+   | powerpcle-*:*powerpc64le*:yes)
if test x$cpu_is_64bit = xyes; then
tm_file="${tm_file} rs6000/default64.h"
fi
@@ -2379,9 +2383,14 @@
esac
extra_options="${extra_options} rs6000/linux64.opt"
;;
+   powerpc64*)
+   tm_file="${tm_file} rs6000/default64.h rs6000/linux64.h 
glibc-stdint.h"
+   extra_options="${extra_options} rs6000/linux64.opt"
+   tmake_file="${tmake_file} rs6000/t-linux"
+   ;;
*)
tm_file="${tm_file} rs6000/linux.h glibc-stdint.h"
-   tmake_file="$tmake_file rs6000/t-linux"
+   tmake_file="${tmake_file} rs6000/t-ppco

Re: #pragma GCC unroll support

2015-03-05 Thread Mike Stump

On Jan 30, 2015, at 8:27 AM, Mike Stump  wrote:
> On Jan 30, 2015, at 7:49 AM, Joseph Myers  wrote:
>> Use error_at, and %u directly in the format.
> 
> Done.

Ping?



Index: ada/gcc-interface/trans.c
===
--- ada/gcc-interface/trans.c   (revision 220084)
+++ ada/gcc-interface/trans.c   (working copy)
@@ -7870,17 +7870,20 @@ gnat_gimplify_stmt (tree *stmt_p)
  {
/* Deal with the optimization hints.  */
if (LOOP_STMT_IVDEP (stmt))
- gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+ gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
 build_int_cst (integer_type_node,
-   annot_expr_ivdep_kind));
+   annot_expr_ivdep_kind),
+NULL_TREE);
if (LOOP_STMT_NO_VECTOR (stmt))
- gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+ gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
 build_int_cst (integer_type_node,
-   annot_expr_no_vector_kind));
+   annot_expr_no_vector_kind),
+NULL_TREE);
if (LOOP_STMT_VECTOR (stmt))
- gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
+ gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond,
 build_int_cst (integer_type_node,
-   annot_expr_vector_kind));
+   annot_expr_vector_kind),
+NULL_TREE);
 
gnu_cond
  = build3 (COND_EXPR, void_type_node, gnu_cond, NULL_TREE,
Index: c/c-parser.c
===
--- c/c-parser.c(revision 220084)
+++ c/c-parser.c(working copy)
@@ -1217,9 +1217,9 @@ static void c_parser_statement (c_parser
 static void c_parser_statement_after_labels (c_parser *);
 static void c_parser_if_statement (c_parser *);
 static void c_parser_switch_statement (c_parser *);
-static void c_parser_while_statement (c_parser *, bool);
-static void c_parser_do_statement (c_parser *, bool);
-static void c_parser_for_statement (c_parser *, bool);
+static void c_parser_while_statement (c_parser *, bool, unsigned short);
+static void c_parser_do_statement (c_parser *, bool, unsigned short);
+static void c_parser_for_statement (c_parser *, bool, unsigned short);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -4972,13 +4972,13 @@ c_parser_statement_after_labels (c_parse
  c_parser_switch_statement (parser);
  break;
case RID_WHILE:
- c_parser_while_statement (parser, false);
+ c_parser_while_statement (parser, false, 0);
  break;
case RID_DO:
- c_parser_do_statement (parser, false);
+ c_parser_do_statement (parser, false, 0);
  break;
case RID_FOR:
- c_parser_for_statement (parser, false);
+ c_parser_for_statement (parser, false, 0);
  break;
case RID_CILK_FOR:
  if (!flag_cilkplus)
@@ -5340,7 +5340,7 @@ c_parser_switch_statement (c_parser *par
 */
 
 static void
-c_parser_while_statement (c_parser *parser, bool ivdep)
+c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll)
 {
   tree block, cond, body, save_break, save_cont;
   location_t loc;
@@ -5354,9 +5354,15 @@ c_parser_while_statement (c_parser *pars
 "%<_Cilk_spawn%> statement cannot be used as a condition for while 
statement"))
 cond = error_mark_node;
   if (ivdep && cond != error_mark_node)
-cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
   build_int_cst (integer_type_node,
-  annot_expr_ivdep_kind));
+ annot_expr_ivdep_kind),
+  NULL_TREE);
+  if (unroll && cond != error_mark_node)
+cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+  build_int_cst (integer_type_node,
+ annot_expr_unroll_kind),
+  build_int_cst (integer_type_node, unroll));
   save_break = c_break_label;
   c_break_label = NULL_TREE;
   save_cont = c_cont_label;
@@ -5375,7 +5381,7 @@ c_parser_while_statement (c_parser *pars
 */
 
 static void
-c_parser_do_statement (c_parser *parser, bool ivdep)
+c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
 {
   tree block, cond, body, save_break, save_cont, new_break, new_cont;
   locatio

[patch] Optimize empty class copies within a C++ return statement

2015-03-05 Thread Aldy Hernandez

While looking at PR65284, I was confused by the gimple we generate for 
returns of empty classes by value:


class obj {
  public:
   obj(int);
};

obj funky(){
return obj(555);
}

For the above snippet, we generate:

obj funky() ()
{
  struct obj D.2248;
  struct obj D.2246;

  obj::obj (&D.2246, 555);
  try
{
  return D.2248;
}
  finally
{
  D.2246 = {CLOBBER};
}
}

Particularly confusing is the return of uninitialized D.2248.  Jason 
tried to beat into me today the fact that it doesn't matter because 
there is no value to initialize since the class is empty.  I still think 
it's weird, however...


What led us to the above gimple was the fact that we lowered into:

return retval = D.2248 = D.2246

which was later optimized by cp_gimplify_expr into "return D.2248" 
because the C++ gimplifier hook notices that the copy is unnecessary.


Jason suggested that it would be nice to remove D.2248 altogether.  With 
the attached patch we notice the superfluous copy in the return value 
and optimize it away.  After some hoops we now get:


  obj::obj (&D.2246, 555);
  try
{
  return ;
}
  finally
{
  D.2246 = {CLOBBER};
}

Tested on x86-64 Linux.

OK?
commit 8fd545d608f2a6c11f889e11c700711b8f911c02
Author: Aldy Hernandez 
Date:   Thu Mar 5 12:23:27 2015 -0800

* cp-gimplify.c (cp_gimplify_expr): Optimize empty class copies
within a return statement.

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 4233a64..f8d4559 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -740,6 +740,29 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
}
   break;
 
+case RETURN_EXPR:
+  {
+   /* Optimize `return  = object' where object's type is
+  an empty class.  Avoid the copy, altogether and just return
+  retval.  */
+   tree ret = TREE_OPERAND (*expr_p, 0);
+   if (ret && (TREE_CODE (ret) == INIT_EXPR
+   || TREE_CODE (ret) == MODIFY_EXPR)
+   && TREE_CODE (TREE_OPERAND (ret, 0)) == RESULT_DECL
+   && is_gimple_lvalue (TREE_OPERAND (ret, 0))
+   && is_really_empty_class (TREE_TYPE (TREE_OPERAND (ret, 0
+ {
+   tree result_decl = TREE_OPERAND (ret, 0);
+   tree list = alloc_stmt_list ();
+   append_to_statement_list (TREE_OPERAND (ret, 1), &list);
+   append_to_statement_list (build1 (RETURN_EXPR, void_type_node,
+ result_decl), &list);
+   *expr_p = list;
+   return GS_OK;
+ }
+   /* Otherwise fall through.  */
+  }
+
 default:
   ret = (enum gimplify_status) c_gimplify_expr (expr_p, pre_p, post_p);
   break;
diff --git a/gcc/testsuite/g++.dg/other/empty-class.C 
b/gcc/testsuite/g++.dg/other/empty-class.C
new file mode 100644
index 000..a14c437
--- /dev/null
+++ b/gcc/testsuite/g++.dg/other/empty-class.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-gimple" } */
+
+/* Test that we return retval directly, instead of going through an
+   intermediate temporary, when returning an empty class.  */
+
+class obj {
+  public:
+   obj(int);
+};
+
+obj funky(){
+return obj(555);
+}
+
+/* { dg-final { scan-tree-dump-times "return ;" 1 "gimple" } } */
+/* { dg-final { cleanup-tree-dump "gimple" } } */

Re: [PATCH] PR63175 - [4.9/5 regression] FAIL: gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c scan-tree-dump-times slp2" basic block vectorized using SLP" 1

2015-03-05 Thread Martin Sebor


Attached is a scaled down version of the test for the bug.
It fixes the scan-tree-dump-times string to match what GCC
5 prints and moves the result checking out of the test
function and into main to prevent it from getting optimized
away (as observed in comment #8 on the bug).

The patch also adds a regression test for the bug to scan
the assembly for the absence of ordinary loads and stores.

Tested on ppc64le-linux.

Does it look okay to everyone?

Martin
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 2e77ba4..27f41fd 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,11 @@
+2015-03-05  Martin Sebor  
+
+	* PR testsuite/63175
+	* gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c (main1): Move
+	checking of results into main to prevent it from getting optimized
+	away.
+	* gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c: New test.
+
 2015-03-04  Ian Lance Taylor  
 
 	* go.test/go-test.exp (go-gc-tests): Skip nilptr test on s390*.
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
new file mode 100644
index 000..73c0afa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
@@ -0,0 +1,30 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-do compile } */
+
+#define N 16 
+
+const unsigned int in[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int out[N];
+
+__attribute__ ((noinline)) int
+main1 (void)
+{
+  const unsigned int *pin = &in[1];
+  unsigned int *pout = &out[0];
+
+  /* Misaligned load.  */
+  *pout++ = *pin++;
+  *pout++ = *pin++;
+  *pout++ = *pin++;
+  *pout++ = *pin++;
+
+  return 0;
+}
+
+/* Verify that the assembly contains vector instructions alone
+   with no word loads (lw, lwu, lwz, lwzu, or their indexed forms)
+   or word stores (stw, stwu, stwx, stwux, or their indexed forms).  */
+
+/* { dg-final { scan-assembler "\t\(lxv|lvsr|stxv\)" } } */
+/* { dg-final { scan-assembler-not "\tlwz?u?x? " } } */
+/* { dg-final { scan-assembler-not "\tstwu?x? " } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c
index e1bc1a8..45046f4 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a.c
@@ -1,6 +1,5 @@
 /* { dg-require-effective-target vect_int } */
 
-#include 
 #include "../../tree-vect.h"
 
 #define N 16 
@@ -9,12 +8,10 @@ unsigned int out[N];
 unsigned int in[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
 
 __attribute__ ((noinline)) int
-main1 (unsigned int x, unsigned int y)
+main1 (void)
 {
-  int i;
   unsigned int *pin = &in[1];
   unsigned int *pout = &out[0];
-  unsigned int a0, a1, a2, a3;
 
   /* Misaligned load.  */
   *pout++ = *pin++;
@@ -22,13 +19,6 @@ main1 (unsigned int x, unsigned int y)
   *pout++ = *pin++;
   *pout++ = *pin++;
 
-  /* Check results.  */
-  if (out[0] != in[1]
-  || out[1] != in[2]
-  || out[2] != in[3]
-  || out[3] != in[4])
-abort();
-
   return 0;
 }
 
@@ -36,11 +26,17 @@ int main (void)
 {
   check_vect ();
 
-  main1 (2, 3);
+  main1 ();
+
+  /* Check results.  */
+  if (out[0] != in[1]
+  || out[1] != in[2]
+  || out[2] != in[3]
+  || out[3] != in[4])
+abort();
 
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized using SLP" 1 "slp2"  { xfail  vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2"  { xfail  vect_no_align } } } */
 /* { dg-final { cleanup-tree-dump "slp2" } } */
-

Go patch committed: Do not declare type switch variable outside case statements

2015-03-05 Thread Ian Lance Taylor

This patch by Chris Manghane fixes a bug for cases like this:
switch x := v.(type) {
case *x:
in which the type name in the case happens to be the same as the
variable name in the type switch.  This is rather confusing code, but
it should work.  This is http://golang.org/issue/10047 .  Bootstrapped
and ran Go testsuite on x86_64-unknown-linux-gnu.  Committed to
mainline.

Ian
diff -r d42a0819e2eb go/gogo.cc
--- a/go/gogo.ccFri Feb 06 08:17:54 2015 -0800
+++ b/go/gogo.ccThu Mar 05 16:21:29 2015 -0800
@@ -6030,6 +6030,7 @@
   Type* type = this->type_;
   Expression* init = this->init_;
   if (this->is_type_switch_var_
+  && type != NULL
   && this->type_->is_nil_constant_as_type())
 {
   Type_guard_expression* tge = this->init_->type_guard_expression();
@@ -6103,7 +6104,9 @@
   // type here.  It will have an initializer which is a type guard.
   // We want to initialize it to the value without the type guard, and
   // use the type of that value as well.
-  if (this->is_type_switch_var_ && this->type_->is_nil_constant_as_type())
+  if (this->is_type_switch_var_
+  && this->type_ != NULL
+  && this->type_->is_nil_constant_as_type())
 {
   Type_guard_expression* tge = this->init_->type_guard_expression();
   go_assert(tge != NULL);
diff -r d42a0819e2eb go/parse.cc
--- a/go/parse.cc   Fri Feb 06 08:17:54 2015 -0800
+++ b/go/parse.cc   Thu Mar 05 16:21:29 2015 -0800
@@ -50,8 +50,7 @@
 break_stack_(NULL),
 continue_stack_(NULL),
 iota_(0),
-enclosing_vars_(),
-type_switch_vars_()
+enclosing_vars_()
 {
 }
 
@@ -4596,32 +4595,33 @@
 Parse::type_switch_body(Label* label, const Type_switch& type_switch,
Location location)
 {
-  Named_object* switch_no = NULL;
-  if (!type_switch.name.empty())
-{
-  if (Gogo::is_sink_name(type_switch.name))
-   error_at(type_switch.location,
-"no new variables on left side of %<:=%>");
+  Expression* init = type_switch.expr;
+  std::string var_name = type_switch.name;
+  if (!var_name.empty())
+{
+  if (Gogo::is_sink_name(var_name))
+{
+  error_at(type_switch.location,
+   "no new variables on left side of %<:=%>");
+  var_name.clear();
+}
   else
{
- Variable* switch_var = new Variable(NULL, type_switch.expr, false,
- false, false,
- type_switch.location);
- switch_no = this->gogo_->add_variable(type_switch.name, switch_var);
+  Location loc = type_switch.location;
+ Temporary_statement* switch_temp =
+  Statement::make_temporary(NULL, init, loc);
+ this->gogo_->add_statement(switch_temp);
+  init = Expression::make_temporary_reference(switch_temp, loc);
}
 }
 
   Type_switch_statement* statement =
-Statement::make_type_switch_statement(switch_no,
- (switch_no == NULL
-  ? type_switch.expr
-  : NULL),
- location);
-
+  Statement::make_type_switch_statement(var_name, init, location);
   this->push_break_statement(statement, label);
 
   Type_case_clauses* case_clauses = new Type_case_clauses();
   bool saw_default = false;
+  std::vector implicit_vars;
   while (!this->peek_token()->is_op(OPERATOR_RCURLY))
 {
   if (this->peek_token()->is_eof())
@@ -4629,7 +4629,8 @@
  error_at(this->location(), "missing %<}%>");
  return NULL;
}
-  this->type_case_clause(switch_no, case_clauses, &saw_default);
+  this->type_case_clause(var_name, init, case_clauses, &saw_default,
+ &implicit_vars);
 }
   this->advance_token();
 
@@ -4637,14 +4638,36 @@
 
   this->pop_break_statement();
 
+  // If there is a type switch variable implicitly declared in each case 
clause,
+  // check that it is used in at least one of the cases.
+  if (!var_name.empty())
+{
+  bool used = false;
+  for (std::vector::iterator p = implicit_vars.begin();
+  p != implicit_vars.end();
+  ++p)
+   {
+ if ((*p)->var_value()->is_used())
+   {
+ used = true;
+ break;
+   }
+   }
+  if (!used)
+   error_at(type_switch.location, "%qs declared and not used",
+Gogo::message_name(var_name).c_str());
+}
   return statement;
 }
 
 // TypeCaseClause  = TypeSwitchCase ":" [ StatementList ] .
+// IMPLICIT_VARS is the list of variables implicitly declared for each type
+// case if there is a type switch variable declared.
 
 void
-Parse::type_case_clause(Named_object* switch_no, Type_case_clauses* clauses,
-   bool* saw_default)
+Parse::type_case_clause(const std::string& var_name, Expression* init,
+

[PATCH] PR target/65248: Copy relocation against protected symbol doesn't work

2015-03-05 Thread H.J. Lu

Protected data symbol means that it can't be pre-emptied.  It doesn't mean
its address won't be external.  This is true for pointer to protected
function.  With copy relocation, address of protected data defined in the
shared library may also be external.  We only know that for sure at
run-time.  TARGET_BINDS_LOCAL_P should return false on protected data
symbol.  OK for trunk?

Thanks.

H.J.
---
PR target/65248
* output.h (default_binds_local_p_2): New.
* varasm.c (default_binds_local_p_2): Renamed to ...
(default_binds_local_p_3): This.  Don't return true on protected
data symbol if protected data may be external.
(default_binds_local_p): Use default_binds_local_p_3.
(default_binds_local_p_1): Likewise.
(default_binds_local_p_2): New.
* config/i386/i386.c (TARGET_BINDS_LOCAL_P): Replace
darwin_binds_local_p with default_binds_local_p_2.
---
 gcc/config/i386/i386.c |  3 +++
 gcc/output.h   |  1 +
 gcc/varasm.c   | 18 +++---
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ab8f03a..41a487a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -51878,6 +51878,9 @@ ix86_initialize_bounds (tree var, tree lb, tree ub, 
tree *stmts)
 #if TARGET_MACHO
 #undef TARGET_BINDS_LOCAL_P
 #define TARGET_BINDS_LOCAL_P darwin_binds_local_p
+#else
+#undef TARGET_BINDS_LOCAL_P
+#define TARGET_BINDS_LOCAL_P default_binds_local_p_2
 #endif
 #if TARGET_DLLIMPORT_DECL_ATTRIBUTES
 #undef TARGET_BINDS_LOCAL_P
diff --git a/gcc/output.h b/gcc/output.h
index 217d979..53e47d0 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -586,6 +586,7 @@ extern void default_asm_output_anchor (rtx);
 extern bool default_use_anchors_for_symbol_p (const_rtx);
 extern bool default_binds_local_p (const_tree);
 extern bool default_binds_local_p_1 (const_tree, int);
+extern bool default_binds_local_p_2 (const_tree);
 extern void default_globalize_label (FILE *, const char *);
 extern void default_globalize_decl_name (FILE *, tree);
 extern void default_emit_unwind_label (FILE *, tree, int, int);
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 8173207..87ac646 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6803,7 +6803,8 @@ resolution_local_p (enum ld_plugin_symbol_resolution 
resolution)
 }
 
 static bool
-default_binds_local_p_2 (const_tree exp, bool shlib, bool weak_dominate)
+default_binds_local_p_3 (const_tree exp, bool shlib, bool weak_dominate,
+bool extern_protected_data)
 {
   /* A non-decl is an entry in the constant pool.  */
   if (!DECL_P (exp))
@@ -6849,6 +6850,9 @@ default_binds_local_p_2 (const_tree exp, bool shlib, bool 
weak_dominate)
  or if we have a definition for the symbol.  We cannot infer visibility
  for undefined symbols.  */
   if (DECL_VISIBILITY (exp) != VISIBILITY_DEFAULT
+  && (TREE_CODE (exp) == FUNCTION_DECL
+ || !extern_protected_data
+ || DECL_VISIBILITY (exp) != VISIBILITY_PROTECTED)
   && (DECL_VISIBILITY_SPECIFIED (exp) || defined_locally))
 return true;
 
@@ -6884,13 +6888,21 @@ default_binds_local_p_2 (const_tree exp, bool shlib, 
bool weak_dominate)
 bool
 default_binds_local_p (const_tree exp)
 {
-  return default_binds_local_p_2 (exp, flag_shlib != 0, true);
+  return default_binds_local_p_3 (exp, flag_shlib != 0, true, false);
+}
+
+/* Similar to default_binds_local_p, but protected data may be
+   external.  */
+bool
+default_binds_local_p_2 (const_tree exp)
+{
+  return default_binds_local_p_3 (exp, flag_shlib != 0, true, true);
 }
 
 bool
 default_binds_local_p_1 (const_tree exp, int shlib)
 {
-  return default_binds_local_p_2 (exp, shlib != 0, false);
+  return default_binds_local_p_3 (exp, shlib != 0, false, false);
 }
 
 /* Return true when references to DECL must bind to current definition in
-- 
1.9.3

Re: #pragma GCC unroll support

2015-03-05 Thread Sandra Loosemore


On 03/05/2015 04:12 PM, Mike Stump wrote:


Ping?



Just commenting on the documentation part:


Index: doc/extend.texi
===
--- doc/extend.texi (revision 220084)
+++ doc/extend.texi (working copy)
@@ -17881,6 +17881,18 @@ void ignore_vec_dep (int *a, int k, int
 @}
 @end smallexample

+@table @code
+@item #pragma GCC unroll @var{n}
+@cindex pragma GCC unroll @var{n}
+
+With this pragma, the programmer informs the optimizer how many times
+a loop should be unrolled.  A 0 or 1 informs the compiler to not
+perform any loop unrolling.  The pragma must be immediately before
+@samp{#pragma ivdep} or a @code{for}, @code{while} or @code{do} loop
+and applies only to the loop that follows.  @var{n} is an
+assignment-expression that evaluates to an integer constant.
+
+@end table

 @node Unnamed Fields
 @section Unnamed struct/union fields within structs/unions


User documentation shouldn't refer to the reader as "the programmer"; 
either use the second person "you" or the imperative.  I'd also 
rearrange the paragraph slightly to put the two sentences about the 
parameter together, something like:


Use this pragma to inform the compiler how many times a loop should be 
unrolled.  The pragma must be immediately before

@samp{#pragma ivdep} or a @code{for}, @code{while} or @code{do} loop
and applies only to the loop that follows.  @var{n} is an
assignment-expression that evaluates to an integer constant.
A 0 or 1 informs the compiler to not perform any loop unrolling.

-Sandra

Re: [PATCH] gcc/genrecog.c: Check matching constraint in MATCH_OPERAND.

2015-03-05 Thread Chen Gang

Hello Maintainers:

Please help check this patch when you have time.


I have to leave Sunrus, the mail address (gang.c...@sunrus.com.cn) will
be closed soon (Sunrus will be closed soon because of money, I guess).

I change my new email address (xili_gchen_5...@hotmail.com) to continue
communicating. gang.chen.5...@gmail.com is still have effect, but it is
not stable (gmail in China is not stable).

I apologize for inconvenience.


Thanks.

On 2/27/15 08:37, Chen Gang S wrote:
> On 02/26/2015 08:13 PM, Chen Gang S wrote:
>> On 02/26/2015 04:04 PM, Chen Gang S wrote:
>>> If check fails, genrecog needs to stop and report error, so can let the
>>> issue be found in time. The implementation is referenced from "gcc/doc/
>>> md.log":
>>>
>>>   [...]
>>>
>>>   whitespace
>>>Whitespace characters are ignored and can be inserted at any
>>>position except the first.  This enables each alternative for
>>>different operands to be visually aligned in the machine
>>>description even if they have different number of constraints and
>>>modifiers.
>>>
>>>   [...]
>>>
>>>   '0', '1', '2', ... '9'
>>>[...]
>>>If a digit is used together with letters within the same
>>>alternative, the digit should come last.
>>>
>>
>> Oh, I guess, I misunderstood the contents above. e.g. "Up3" which
>> defined in aarch64 is not "matching constraint", I should skip it.
>>
> 
> If I really misunderstood, for me, the related patch should be:
> 
> diff --git a/gcc/genrecog.c b/gcc/genrecog.c
> index 81a0e79..9367d74 100644
> --- a/gcc/genrecog.c
> +++ b/gcc/genrecog.c
> @@ -503,7 +503,8 @@ validate_pattern (rtx pattern, rtx insn, rtx set, int 
> set_code)
>  
>   if (code == MATCH_OPERAND)
> {
> - const char constraints0 = XSTR (pattern, 2)[0];
> + const char *constraints = XSTR (pattern, 2);
> + const char constraints0 = constraints[0];
>  
>   if (!constraints_supported_in_insn_p (insn))
> {
> @@ -537,6 +538,33 @@ validate_pattern (rtx pattern, rtx insn, rtx set, int 
> set_code)
>  "operand %d missing output reload",
>  XINT (pattern, 0));
> }
> +
> + /* For matching constraint in MATCH_OPERAND, the digit must be a
> +smaller number than the number of the operand that uses it in the
> +constraint.  */
> + while (1)
> +   {
> + while (constraints[0]
> +&& (constraints[0] == ' ' || constraints[0] == ','))
> +   constraints++;
> + if (!constraints[0])
> +   break;
> +
> + if (constraints[0] >= '0' && constraints[0] <= '9')
> +   {
> + int val;
> +
> + sscanf (constraints, "%d", &val);
> + if (val >= XINT (pattern, 0))
> +   error_with_line (pattern_lineno,
> +"constraint digit %d is not smaller than"
> +" operand %d",
> +val, XINT (pattern, 0));
> +   }
> +
> + while (constraints[0] && constraints[0] != ',')
> +   constraints++;
> +   }
> }
>  
>   /* Allowing non-lvalues in destinations -- particularly CONST_INT --
> 
> 
> If necessary (I guess, it is), I shall send patch v2 for it.
> 
> Thanks.
> 
>>>This number is allowed to be more than a single digit.  If multiple
>>>digits are encountered consecutively, they are interpreted as a
>>>single decimal integer. [...]
>>>
>>>[...]
>>>
>>>[...]   Moreover, the digit must be a
>>>smaller number than the number of the operand that uses it in the
>>>constraint.
>>>
>>>   [...]
>>>
>>>   The overall constraint for an operand is made from the letters for this
>>>   operand from the first alternative, a comma, the letters for this
>>>   operand from the second alternative, a comma, and so on until the last
>>>   alternative.
>>>
>>>   [...]
>>>
>>> The patch passes test:
>>>
>>>  - genrecog can report the related issue when cross compiling xtensa.
>>>And after the related xtensa issue is fixed, genrecog will not report
>>>error, either.
>>>
>>>  - For x86_64-unknown-linux-gnu building all-gcc, it is OK, too.
>>>
>>>
>>> 2015-02-26  Chen Gang  
>>>
>>> * genrecog.c (validate_pattern): Check matching constraint in
>>> MATCH_OPERAND and use 'opnu' for all 'XINT (pattern, 0)'.
>>> ---
>>>  gcc/genrecog.c | 39 +++
>>>  1 file changed, 35 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/gcc/genrecog.c b/gcc/genrecog.c
>>> index 81a0e79..4b508e8 100644
>>> --- a/gcc/genrecog.c
>>> +++ b/gcc/genrecog.c
>>> @@ -503,7 +503,9 @@ validate_pattern (rtx pattern, rtx insn, rtx set, int 
>>> set_code)
>>>  
>>> if (code == MAT

Re: RFC: PATCHES: Properly handle reference to protected data on x86

2015-03-05 Thread Alan Modra

On Wed, Mar 04, 2015 at 03:26:10PM -0800, H.J. Lu wrote:
> Protected symbol means that it can't be pre-emptied.  It
> doesn't mean its address won't be external.  This is true
> for pointer to protected function.  With copy relocation,
> address of protected data defined in the shared library may
> also be external.  We only know that for sure at run-time.
> Here are patches for glibc, binutils and GCC to handle it
> properly.
> 
> Any comments?

I'd like to see this pass some more tests.  For example

reference in non-PIC exe to var x
protected visibility definition of x in libA
protected visibility definition of x in libB

I suspect you don't have this case correct, but congratulations if you
do!  Assuming libA is first on the breadth first search for libraries,
then exe and libA ought to use the same x, but libB have its own x.

In fact it would be good to prove that all variations of either a
reference, a default visibility definition or a protected visibility
definition worked in the exe plus two libs case.

-- 
Alan Modra
Australia Development Lab, IBM

RE: [Patch,microblaze]: Optimized usage of pcmp conditional instruction.

2015-03-05 Thread Ajit Kumar Agarwal



-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com] 
Sent: Thursday, February 26, 2015 4:29 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Optimized usage of pcmp conditional 
instruction.

On 02/25/15 02:19, Ajit Kumar Agarwal wrote:
> Hello All:
>
> Please find the patch for the optimized usage of pcmp instructions in 
> microblaze. No regressions is seen In deja GNU tests. There are many 
> testcases that are already there in deja GNU to check the generation of 
> pcmpne/pcmpeq instructions and are used to check the validity.
>
> commit b74acf44ce4286649e5be7cff7518d814cb2491f
> Author: Ajit Kumar Agarwal 
> Date:   Wed Feb 25 15:33:02 2015 +0530
>
>  [Patch,microblaze]: Optimized usage of pcmp conditional instruction.
>
>  The changes are made in the patch for optimized usage of pcmpne/pcmpeq
>  instructions. The xor with register to register is replaced with pcmpeq
>  /pcmpne instructions and for immediate check still the xori will be used.
>  The purpose of the change is to acheive the aggressive usage of pcmpne
>  /pcmpeq instructions instead of xor being used for comparison.
>
>  ChangeLog:
>  2015-02-25  Ajit Agarwal  
>
>  * config/microblaze/microblaze.md (cbranchsi4): Added immediate
>  constraints.
>  (cbranchsi4_reg): New.
>  * config/microblaze/microblaze.c
>  (microblaze_expand_conditional_branch_reg): New.
>  * config/microblaze/microblaze-protos.h
>  (microblaze_expand_conditional_branch_reg): New prototype.

+  if (cmp_op1 == const0_rtx)
+{
+  comp_reg = cmp_op0;
+  condition = gen_rtx_fmt_ee (signed_condition (code),
+  SImode, comp_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (condition, label1));
+}
+
+  else if (code == EQ || code == NE)
+{
+  if (code == NE)
+{
+  emit_insn (gen_sne_internal_pat (comp_reg, cmp_op0,
+   cmp_op1));
+  condition = gen_rtx_NE (SImode, comp_reg, const0_rtx);
+}
+  else
+{
+  emit_insn (gen_seq_internal_pat (comp_reg,
+   cmp_op0, cmp_op1));
+  condition = gen_rtx_EQ (SImode, comp_reg, const0_rtx);
+}
+  emit_jump_insn (gen_condjump (condition, label1));
+}
+  else
+{
...

>>No blank line between end brace of if and else.

>>Replace with
>>+  else if (code == EQ)
>>+{
>>+   emit_insn (gen_seq_internal_pat (comp_reg, cmp_op0, cmp_op1));
>>+   condition = gen_rtx_EQ (SImode, comp_reg, const0_rtx);
>>+   emit_jump_insn (gen_condjump (condition, label1));
>>+}
>>+  else if (code == NE)
>>+{
>>+  emit_insn (gen_sne_internal_pat (comp_reg, cmp_op0, cmp_op1));
>>+  condition = gen_rtx_NE (SImode, comp_reg, const0_rtx);
>>+  emit_jump_insn (gen_condjump (condition, label1));
>>+}
>>+  else
>>+{
>>...

>>--

Changes are  incorporated. Please find the log of the updated patch.

commit 91f275c144165320850ddf18e3a1e059a66c
Author: Ajit Kumar Agarwal 
Date:   Fri Mar 6 09:55:11 2015 +0530

[Patch,microblaze]: Optimized usage of pcmp conditional instruction.

The changes are made in the patch for optimized usage of pcmpne/pcmpeq
instructions. The xor with register to register is replaced with pcmpeq
/pcmpne instructions and for immediate check still the xori will be used.
The purpose of the change is to acheive the aggressive usage of pcmpne
/pcmpeq instructions instead of xor being used for comparison.

ChangeLog:
2015-03-06  Ajit Agarwal  

* config/microblaze/microblaze.md (cbranchsi4): Added immediate
constraints.
(cbranchsi4_reg): New.
* config/microblaze/microblaze.c
(microblaze_expand_conditional_branch_reg): New.
* config/microblaze/microblaze-protos.h
(microblaze_expand_conditional_branch_reg): New prototype.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit
 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


0001-Patch-microblaze-Optimized-usage-of-pcmp-conditional.patch
Description: 0001-Patch-microblaze-Optimized-usage-of-pcmp-conditional.patch

Re: [patch] Optimize empty class copies within a C++ return statement

2015-03-05 Thread Jason Merrill


On 03/05/2015 06:25 PM, Aldy Hernandez wrote:

+   tree ret = TREE_OPERAND (*expr_p, 0);
+   if (ret && (TREE_CODE (ret) == INIT_EXPR
+   || TREE_CODE (ret) == MODIFY_EXPR)
+   && TREE_CODE (TREE_OPERAND (ret, 0)) == RESULT_DECL
+   && is_gimple_lvalue (TREE_OPERAND (ret, 0))
+   && is_really_empty_class (TREE_TYPE (TREE_OPERAND (ret, 0
+ {
+   tree result_decl = TREE_OPERAND (ret, 0);
+   tree list = alloc_stmt_list ();
+   append_to_statement_list (TREE_OPERAND (ret, 1), &list);
+   append_to_statement_list (build1 (RETURN_EXPR, void_type_node,
+ result_decl), &list);
+   *expr_p = list;
+   return GS_OK;
+ }


This should really use the MODIFY_EXPR case rather than duplicate it 
here.  Actually, why don't we already hit that case when processing the 
RETURN_EXPR?


Jason

[PATCH ARM]Print CPU tuning information as comment in assembler file.

2015-03-05 Thread Bin Cheng

Hi,
This patch is the first part fixing memset-inline-{4,5,6,8,9}.c failures on
cortex-a9.  GCC/arm doesn't generate any tuning information in assembly, it
can't tell whether we are compiling for cortex-a9 tune if the compiler is
configured so by default.
This patch introduces a new (target dependent) option "-mprint-tune-info".
It prints CPU tuning information as comment in assembler file, thus DEJAGNU
can check it and make decisions.  By default the option is disabled, so it
won't change current behaviors.  For now, pointers in tune structure are not
printed, we should improve that and output more useful information in the
long run.

Another patch is followed adding DEJAGNU test function and adapting test
strings.

Build and test on arm-none-eabi, is it OK?

2015-03-06  Bin Cheng  

* config/arm/arm.opt (print_tune_info): New option.
* config/arm/arm.c (arm_print_tune_info): New function.
(arm_file_start): Call arm_print_tune_info.
* config/arm/arm-protos.h (struct tune_params): Add comment.
* doc/invoke.texi (@item -mprint-tune-info): New item.
(-mtune): mention it in ARM Option Summary.Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 221098)
+++ gcc/doc/invoke.texi (working copy)
@@ -540,6 +540,7 @@ Objective-C and Objective-C++ Dialects}.
 -mfp16-format=@var{name}
 -mthumb-interwork  -mno-thumb-interwork @gol
 -mcpu=@var{name}  -march=@var{name}  -mfpu=@var{name}  @gol
+-mtune=@var{name} -mprint-tune-info @gol
 -mstructure-size-boundary=@var{n} @gol
 -mabort-on-noreturn @gol
 -mlong-calls  -mno-long-calls @gol
@@ -13295,6 +13296,13 @@ should be considered deprecated.
 Restricts generation of IT blocks to conform to the rules of ARMv8.
 IT blocks can only contain a single 16-bit instruction from a select
 set of instructions. This option is on by default for ARMv8 Thumb mode.
+
+@item -mprint-tune-info
+@opindex mprint-tune-info
+Print CPU tuning information as comment in assembler file.  This is
+an option used only for regression testing of the compiler and not
+intended for ordinary use in compiling code.  This option is disabled
+by default.
 @end table
 
 @node AVR Options
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 221098)
+++ gcc/config/arm/arm.c(working copy)
@@ -25638,6 +25638,59 @@ arm_emit_eabi_attribute (const char *name, int num
   asm_fprintf (asm_out_file, "\n");
 }
 
+/* This function is used to print CPU tuning information as comment
+   in assembler file.  Pointers are not printed for now.  */
+
+void
+arm_print_tune_info (void)
+{
+  asm_fprintf (asm_out_file, "\t@.tune parameters\n");
+  asm_fprintf (asm_out_file, "\t\t@constant_limit:\t%d\n",
+  current_tune->constant_limit);
+  asm_fprintf (asm_out_file, "\t\t@max_insns_skipped:\t%d\n",
+  current_tune->max_insns_skipped);
+  asm_fprintf (asm_out_file, "\t\t@num_prefetch_slots:\t%d\n",
+  current_tune->num_prefetch_slots);
+  asm_fprintf (asm_out_file, "\t\t@l1_cache_size:\t%d\n",
+  current_tune->l1_cache_size);
+  asm_fprintf (asm_out_file, "\t\t@l1_cache_line_size:\t%d\n",
+  current_tune->l1_cache_line_size);
+  asm_fprintf (asm_out_file, "\t\t@prefer_constant_pool:\t%d\n",
+  (int) current_tune->prefer_constant_pool);
+  asm_fprintf (asm_out_file, "\t\t@branch_cost:\t(s:speed, p:predictable)\n");
+  asm_fprintf (asm_out_file, "\t\t\t\ts&p\tcost\n");
+  asm_fprintf (asm_out_file, "\t\t\t\t00\t%d\n",
+  current_tune->branch_cost (false, false));
+  asm_fprintf (asm_out_file, "\t\t\t\t01\t%d\n",
+  current_tune->branch_cost (false, true));
+  asm_fprintf (asm_out_file, "\t\t\t\t10\t%d\n",
+  current_tune->branch_cost (true, false));
+  asm_fprintf (asm_out_file, "\t\t\t\t11\t%d\n",
+  current_tune->branch_cost (true, true));
+  asm_fprintf (asm_out_file, "\t\t@prefer_ldrd_strd:\t%d\n",
+  (int) current_tune->prefer_ldrd_strd);
+  asm_fprintf (asm_out_file, "\t\t@logical_op_non_short_circuit:\t[%d,%d]\n",
+  (int) current_tune->logical_op_non_short_circuit[0],
+  (int) current_tune->logical_op_non_short_circuit[1]);
+  asm_fprintf (asm_out_file, "\t\t@prefer_neon_for_64bits:\t%d\n",
+  (int) current_tune->prefer_neon_for_64bits);
+  asm_fprintf (asm_out_file,
+  "\t\t@disparage_flag_setting_t16_encodings:\t%d\n",
+  (int) current_tune->disparage_flag_setting_t16_encodings);
+  asm_fprintf (asm_out_file,
+  "\t\t@disparage_partial_flag_setting_t16_encodings:\t%d\n",
+  (int) current_tune
+  ->disparage_partial_flag_setting_t16_encodings);
+  asm_fprintf (asm_out_file, "\t\t@string_ops_prefer_neon:\t%d\n",
+  (int) current_tune->string_ops_prefer_neon);
+  asm_fp

[PATCH ARM]Fix memset-inline-* failures on cortex-a9 tune by checking tune information.

2015-03-05 Thread Bin Cheng

Hi,
This patch is the second part fixing memset-inline-{4,5,6,8,9}.c failures on
cortex-a9.  It adds a function checking CPU tuning information in dejagnu,
it also uses that function to skip related testcase when we are compiling
for cortex-a9 tune.

Build and test on arm-none-eabi.  Is it OK?

gcc/testsuite/ChangeLog
2015-03-06  Bin Cheng  

* lib/target-supports.exp (arm_tune_string_ops_prefer_neon): New.
* gcc.target/arm/memset-inline-4.c: Skip for
arm_tune_string_ops_prefer_neon.
* gcc.target/arm/memset-inline-5.c: Ditto.
* gcc.target/arm/memset-inline-6.c: Ditto.
* gcc.target/arm/memset-inline-8.c: Ditto.
* gcc.target/arm/memset-inline-9.c: Ditto.Index: gcc/testsuite/gcc.target/arm/memset-inline-4.c
===
--- gcc/testsuite/gcc.target/arm/memset-inline-4.c  (revision 221098)
+++ gcc/testsuite/gcc.target/arm/memset-inline-4.c  (working copy)
@@ -1,6 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mcpu=cortex-a9" } { "" } } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mtune=cortex-a9" } { "" } } */
+/* { dg-skip-if "Don't inline memset using neon instructions" { ! 
arm_tune_string_ops_prefer_neon } } */
 /* { dg-options "-save-temps -O2 -fno-inline" } */
 /* { dg-add-options "arm_neon" } */
 
Index: gcc/testsuite/gcc.target/arm/memset-inline-5.c
===
--- gcc/testsuite/gcc.target/arm/memset-inline-5.c  (revision 221098)
+++ gcc/testsuite/gcc.target/arm/memset-inline-5.c  (working copy)
@@ -1,6 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mcpu=cortex-a9" } { "" } } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mtune=cortex-a9" } { "" } } */
+/* { dg-skip-if "Don't inline memset using neon instructions" { ! 
arm_tune_string_ops_prefer_neon } } */
 /* { dg-options "-save-temps -O2 -fno-inline" } */
 /* { dg-add-options "arm_neon" } */
 
Index: gcc/testsuite/gcc.target/arm/memset-inline-6.c
===
--- gcc/testsuite/gcc.target/arm/memset-inline-6.c  (revision 221098)
+++ gcc/testsuite/gcc.target/arm/memset-inline-6.c  (working copy)
@@ -1,6 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mcpu=cortex-a9" } { "" } } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mtune=cortex-a9" } { "" } } */
+/* { dg-skip-if "Don't inline memset using neon instructions" { ! 
arm_tune_string_ops_prefer_neon } } */
 /* { dg-options "-save-temps -O2 -fno-inline" } */
 /* { dg-add-options "arm_neon" } */
 
Index: gcc/testsuite/gcc.target/arm/memset-inline-8.c
===
--- gcc/testsuite/gcc.target/arm/memset-inline-8.c  (revision 221098)
+++ gcc/testsuite/gcc.target/arm/memset-inline-8.c  (working copy)
@@ -1,6 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mcpu=cortex-a9" } { "" } } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mtune=cortex-a9" } { "" } } */
+/* { dg-skip-if "Don't inline memset using neon instructions" { ! 
arm_tune_string_ops_prefer_neon } } */
 /* { dg-options "-save-temps -O2 -fno-inline"  } */
 /* { dg-add-options "arm_neon" } */
 
Index: gcc/testsuite/gcc.target/arm/memset-inline-9.c
===
--- gcc/testsuite/gcc.target/arm/memset-inline-9.c  (revision 221098)
+++ gcc/testsuite/gcc.target/arm/memset-inline-9.c  (working copy)
@@ -1,6 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mcpu=cortex-a9" } { "" } } */
-/* { dg-skip-if "Don't inline memset using neon instructions on cortex-a9" { 
*-*-* } { "-mtune=cortex-a9" } { "" } } */
+/* { dg-skip-if "Don't inline memset using neon instructions" { ! 
arm_tune_string_ops_prefer_neon } } */
 /* { dg-options "-save-temps -Os -fno-inline" } */
 /* { dg-add-options "arm_neon" } */
 
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 221098)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -2954,6 +2954,14 @@ proc check_effective_target_arm_cortex_m { } {
 } "-mthumb"]
 }
 
+# Return 1 if this compilation turns on string_ops_prefer_neon on.
+
+proc check_effective_target_arm_tune_string_ops_prefer_neon { } {
+return [check_no_messages_and_pattern arm_tune_string_ops_prefer_neon 
"@string_

67 matches

Mail list logo