[PATCH] Testcase for nonzero_bits & __builtin_unreachable

2013-10-29 Thread Jakub Jelinek
On Sat, Oct 26, 2013 at 12:19:33AM +0200, Jakub Jelinek wrote:
> And here is a patch that allows vectorization without peeling for alignment
> and scalar loop for bound even for fn2, fn3 and fn4 in the following
> testcase, though as with the range __builtin_unreachable () notes, it is
> quite fragile, because it only works if there are no immediate uses of the
> tested SSA_NAME before the assertion.  Perhaps more reliable way would be to
> convert those assertions info __builtin_assume_aligned, but that has the
> disadvantage that it's first argument is a pointer and it returns a pointer,
> so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED
> internal function.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux.

And here is as a follow-up to the whole patchset of 4 patches a testcase:

2013-10-29  Jakub Jelinek  

* gcc.dg/vect/vect-align-3.c: New test.

--- gcc/testsuite/gcc.dg/vect/vect-align-3.c.jj 2013-10-29 08:18:08.636348586 
+0100
+++ gcc/testsuite/gcc.dg/vect/vect-align-3.c2013-10-29 08:21:14.931384138 
+0100
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+int a[2048];
+
+void
+f1 (int x, int y)
+{
+  int i;
+  x &= -256;
+  y &= -256;
+  for (i = x + 256; i < y; i++)
+a[i]++;
+}
+
+void
+f2 (int x, int y)
+{
+  int i;
+  if (x & 31)
+__builtin_unreachable ();
+  if (y & 31)
+__builtin_unreachable ();
+  for (i = x + 256; i < x + y; i++)
+a[i]++;
+}
+
+void
+f3 (int x, int y)
+{
+  int i;
+  if (x % 256)
+__builtin_unreachable ();
+  if (y % 256)
+__builtin_unreachable ();
+  for (i = x + 256; i < x + y; i++)
+a[i]++;
+}
+
+void
+f4 (int x, int y)
+{
+  int i;
+  if ((x % 256) != 0)
+__builtin_unreachable ();
+  if ((y % 256) != 0)
+__builtin_unreachable ();
+  for (i = x + 256; i < x + y; i++)
+a[i]++;
+}
+
+/* { dg-final { scan-tree-dump-not "vect_do_peeling_for_loop_bound" "vect" } } 
*/
+/* { dg-final { scan-tree-dump-not "loop peeled for vectorization" "vect" } } 
*/
+/* { dg-final { cleanup-tree-dump "vect" } } */

Jakub


Re: [PATCH, ARM] Fix line number data for PIC register setup code

2013-10-29 Thread Eric Botcazou
> Updated patch, re-bootstrapped on x86_64 and committed to trunk.
> 
> Also applied to 4.7 and 4.8 branches, the same problem is present there.

You only asked for approval on trunk though, and I'm not sure we really care 
about the results of the GDB testsuite on the 4.7 branch at this point, so 
let's exclude the 4.7 branch here.

-- 
Eric Botcazou


Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-29 Thread Uros Bizjak
Hello!

> For the define_expand I added as below, the else body is there to
> avoid fall-through transformations to ABS operation in optabs.c.
> Otherwise ABS will be converted to other operations even that we have
> corresponding instructions from SSSE3.

No, it wont be.

Fallthrough will generate the pattern that will be matched by the insn
pattern above, just like you are doing by hand below.

> (define_expand "abs2"
>   [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand")
> (abs:VI124_AVX2_48_AVX512F
>  (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand")))]
>   "TARGET_SSE2"
> {
>   if (!TARGET_SSSE3)
> ix86_expand_sse2_abs (operands[0], force_reg (mode, operands[1]));

Do you really need force_reg here? You are using generic expanders in
ix86_expand_sse2_abs that can handle non-registers operands just as
well.

>   else
> emit_insn (gen_rtx_SET (VOIDmode, operands[0],
>gen_rtx_ABS (mode, operands[1])));
>   DONE;
> })

Please note that your mailer mangles indents. Please indent your code correctly.

Uros.


RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2

2013-10-29 Thread Bernd Edlinger
Hi,

On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote:
>
> On 10/28/2013 03:20 AM, Bernd Edlinger wrote:
>>
>> On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote:
>>>
>>> I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new
>>> pr56997-1.c testcase, it got stuck in infinite recursion between
>>> store_split_bit_field/store_fixed_bit_field and/or
>>> extract_split_bit_field/extract_fixed_bit_field. This didn't show up in
>>> my previous mainline testing.
>>>
>>> [snip]
>>
>> I have attached an update to your patch, that should
>> a) fix the recursion problem.
>> b) restrict the -fstrict-volatile-bitfields to not violate the C++ memory 
>> model.
>>
>
> Thanks for picking up the ball on this -- I've been busy with other
> tasks for several days.
>
> I again tried backporting the patch series along with your fix to GCC
> 4.8 and tested on arm-none-eabi. I found that it was still getting
> stuck in infinite recursion unless the test from this patch hunk
>

hmmm
Actually I used your path on a clean 4.8.2 and built for --target=arm-eabi.
I have seen the recursion on the extract_bit_field, but not on store_bit_field
so far, maybe you could give me a hint which test case exposes the other
flavour of this recursion problem.

>> @@ -1699,6 +1711,14 @@ extract_bit_field (rtx str_rtx, unsigned
>> {
>> enum machine_mode mode1;
>>
>> + /* Handle extraction of unaligned fields,
>> + this can happen in -fstrict-volatile-bitfields. */
>> + if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0
>> + && GET_MODE_BITSIZE (GET_MODE (str_rtx)) < GET_MODE_BITSIZE (word_mode)
>> + && bitnum % GET_MODE_BITSIZE (GET_MODE (str_rtx)) + bitsize
>> +> GET_MODE_BITSIZE (GET_MODE (str_rtx)) )
>> + str_rtx = adjust_address (str_rtx, word_mode, 0);
>> +
>> /* Handle -fstrict-volatile-bitfields in the cases where it applies. */
>> if (GET_MODE_BITSIZE (GET_MODE (str_rtx))> 0)
>> mode1 = GET_MODE (str_rtx);
>
> was also inserted into store_bit_field.
>
> I also think that, minimally, this needs to be rewritten as something like
>
> if (MEM_P (str_rtx)
> && STRICT_ALIGNMENT
> && GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0
> && GET_MODE_BITSIZE (GET_MODE (str_rtx)) < GET_MODE_BITSIZE
> (word_mode)
> && (bitnum % MEM_ALIGN (str_rtx) + bitsize
>> GET_MODE_BITSIZE (GET_MODE (str_rtx
> str_rtx = adjust_address (str_rtx, word_mode, 0);
>

Yes, that looks fine. 
 

> Otherwise, we'll end up using word_mode instead of the field mode on
> targets that support unaligned accesses. I tested that (again, 4.8 and
> arm-none-eabi) and results looked good, but of course ARM isn't one of
> the targets that supports unaligned accesses. :-S
>
> I'm still not convinced this is the right fix, though. It seems to me
> that callers of store_bit_field and extract_bit_field in expr.c ought
> not to be messing with the mode of the rtx when
> flag_strict_volatile_bitfields> 0, because that is losing information
> about the original mode that we are trying to patch up here by forcing
> it to word_mode. Instead, I think the callers ought to be passing the
> declared field mode into store_bit_field and extract_bit_field, and
> those functions ought to change the mode of the incoming rtx to match
> the field mode only if strict_volatile_bitfield_p assures us that the
> insertion/extraction can be done in that mode.
>

The problem starts likely at expr.c when this is done:

if (volatilep && flag_strict_volatile_bitfields> 0)
 to_rtx = adjust_address (to_rtx, mode1, 0);

this restricts the possible access mode not only for bit-fields
but for all possible volatile members. But -fstrict-volatile-bitfields
is supposed to affect bit-fields only.

> Alternatively, if strict_volatile_bitfield_p returns false but
> flag_strict_volatile_bitfields> 0, then always force to word_mode and
> change the -fstrict-volatile-bitfields documentation to indicate that's
> the fallback if the insertion/extraction cannot be done in the declared
> mode, rather than claiming that it tries to do the same thing as if
> -fstrict-volatile-bitfields were not enabled at all.
>
> Either way, still needs more work, and more thorough testing (more
> targets, and obviously trunk as well as the 4.8 backport) before I'd
> consider this ready to commit. I might or might not be able to find
> some more time to work on this in the next week
>

Yes. 

And it would be good to reach Richard's Nov-21 deadline for GCC-4.9

Bernd.

> -Sandra
> 

[PATCH] Adjust testsuite with respect to -fvect-cost-model changes

2013-10-29 Thread Richard Biener

-fvect-cost-model changed its behavior dependent on the optimization
level.  The former behavior is restored by using -fvect-cost-model=dynamic
for -fvect-cost-model and -fvect-cost-model=unlimited for 
-fno-vect-cost-model.  The following patch makes this change
throughout the testsuite - which also highlights the fact that
we miss testcases for the cheap cost model.

Previously selected testcases that regressed with the above change
were adjusted, but clearly all cost-model testcases were developed
with the dynamic model in mind.

Committed.

Richard.

2013-10-29  Richard Biener  

* g++.dg/vect/slp-pr56812.cc: Adjust with respect to -fvect-cost-model
changes.
* gcc.dg/vect/bb-slp-32.c: Likewise.
* gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp: Likewise.
* gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Likewise.
* gcc.dg/vect/costmodel/spu/spu-costmodel-vect.exp: Likewise.
* gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp: Likewise.
* gcc.target/powerpc/crypto-builtin-1.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-1.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-2.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-3.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-4.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-5.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-1.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-2.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-3.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-4.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-5.c: Likewise.
* gfortran.dg/vect/vect.exp: Likewise.

Index: trunk/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
===
*** trunk.orig/gcc/testsuite/g++.dg/vect/slp-pr56812.cc 2013-08-30 
09:55:27.0 +0200
--- trunk/gcc/testsuite/g++.dg/vect/slp-pr56812.cc  2013-10-29 
10:06:33.146955101 +0100
***
*** 1,7 
  /* { dg-do compile } */
  /* { dg-require-effective-target vect_float } */
  /* { dg-require-effective-target vect_hw_misalign } */
! /* { dg-additional-options "-O3 -funroll-loops -fvect-cost-model" } */
  
  class mydata {
  public:
--- 1,7 
  /* { dg-do compile } */
  /* { dg-require-effective-target vect_float } */
  /* { dg-require-effective-target vect_hw_misalign } */
! /* { dg-additional-options "-O3 -funroll-loops -fvect-cost-model=dynamic" } */
  
  class mydata {
  public:
Index: trunk/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
===
*** trunk.orig/gcc/testsuite/gcc.dg/vect/bb-slp-32.c2013-08-27 
11:13:22.0 +0200
--- trunk/gcc/testsuite/gcc.dg/vect/bb-slp-32.c 2013-10-29 10:06:38.660018142 
+0100
***
*** 1,6 
  /* { dg-do compile } */
  /* { dg-require-effective-target vect_int } */
! /* { dg-additional-options "-fvect-cost-model" } */
  
  void bar (int *);
  int foo (int *p)
--- 1,6 
  /* { dg-do compile } */
  /* { dg-require-effective-target vect_int } */
! /* { dg-additional-options "-fvect-cost-model=dynamic" } */
  
  void bar (int *);
  int foo (int *p)
Index: trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp
===
*** trunk.orig/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp 
2013-01-11 10:54:37.0 +0100
--- trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp  
2013-10-29 10:07:01.423277877 +0100
*** if { ![istarget i?86*-*-*] && ![istarget
*** 28,34 
  set DEFAULT_VECTCFLAGS ""
  
  # These flags are used for all targets.
! lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize" "-fvect-cost-model"
  
  # If the target system supports vector instructions, the default action
  # for a test is 'run', otherwise it's 'compile'.  Save current default.
--- 28,34 
  set DEFAULT_VECTCFLAGS ""
  
  # These flags are used for all targets.
! lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize" 
"-fvect-cost-model=dynamic"
  
  # If the target system supports vector instructions, the default action
  # for a test is 'run', otherwise it's 'compile'.  Save current default.
Index: trunk/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp
===
*** trunk.orig/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp   
2013-01-11 10:54:37.0 +0100
--- trunk/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp
2013-10-29 10:06:48.601131525 +0100
*** if ![is-effective-target powerpc_altivec
*** 33,39 
  set DEFAULT_VECTCFLAGS ""
  
  # These flags are used for all targets.
! lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize" "-fvect-cost-model" 
"-fno-common"
  
  # If the target system supports vector in

[PATCH GCC]Simplify address expression in IVOPT

2013-10-29 Thread bin.cheng
Hi,
I noticed that IVOPT generates complex address expressions like below for iv
base.
&arr_base[0].y
&arr[0]
&MEM[p+o]
It's even worse for targets support auto-increment addressing mode because
IVOPT adjusts such base expression with +/- step, then creates below:
&arr_base[0].y +/- step
&arr[0] +/- step
&MEM[p+o] +/- step
It has two disadvantages:
1) Cost computation in IVOPT can't handle complex address expression and
general returns spill_cost for it, which is bad since address iv is
important to IVOPT.
2) IVOPT creates duplicate candidates for IVs which have same value in
different forms, for example, two candidates are generated with each for
"&a[0]" and "&a".  Again, it's even worse for auto-increment addressing
mode.

This patch fixes the issue by simplifying address expression at the entry of
allocating IV struct.  Maybe the simplification can be put in various fold*
functions but I think it might be better in this way, because:
1) fold* functions are used from front-end to various tree optimizations,
the simplified address expressions may not be what each optimizer wanted.
Think about parallelism related passes, they might want the array index
information kept for further analysis.
2) In some way, the simplification is conflict with current implementation
of fold* function.  Take fold_binary_loc as an example, it tries to simplify
"&a[i1] +p c* i2" into "&a[i1+i2]".  Of course we can simplify in this way
for IVOPT too, but that will cause new problems like: a) we have to add code
in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want
to avoid; b) the simplification can't always be done because of the
sign/unsigned offset problem (especially for auto-increment addressing
mode).
3) There are many entry point for fold* functions, the change will be
non-trivial.
4) The simplification is only done in alloc_iv for true (not duplicate ones)
iv struct, the number of such iv should be moderate.

With these points, I think it might be a win to do the simplification in
IVOPT and create a kind of sand box to let IVOPT play.  Any suggestions?

Bootstrap and tested on x86/x86_64/arm.
The patch causes three cases failed on some target, but all of them are
false alarm, which can be resolved by refining test cases to check more
accurate information.

Is it OK?

Thanks.
bin

gcc/testsuite/ChangeLog
2013-10-29  Bin Cheng  

* gcc.dg/tree-ssa/loop-2.c: Refine check condition.
* gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto.
* gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto.

2013-10-29  Bin Cheng  

* tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression.
(get_shiftadd_cost): Check equality using operand_equal_p.
(force_expr_to_var_cost): Refactor the code.  Handle type
conversion.
(split_address_cost): Call force_expr_to_var_cost.Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c  (revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c  (working copy)
@@ -27,7 +27,7 @@ void xxx(void)
 
 /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */
 
 /* 17 * iter should be strength reduced.  */
 
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c   (revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c   (working copy)
@@ -7,7 +7,8 @@
 
 extern char a[];
 
-/* Can not infer loop iteration from array -- exit test can not be replaced.  
*/
+/* Can not infer loop iteration from array -- exit test can not be
+   replaced by the array address.  */
 void foo (unsigned int i_width, TYPE dst)
 {
   unsigned long long i = 0;
@@ -21,5 +22,5 @@ void foo (unsigned int i_width, TYPE dst)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "\[^:\]*if \\(.*j_\[0-9\]+.*\\)" 1 
"ivopts"} } */
 /* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c(revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c(working copy)
@@ -18,5 +18,5 @@ long foo(long* p, long* p2, int N1, int N2)
   return s;
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(.*p2.*\\)" 1 
"ivopts"} } */
 /* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
==

[PING] [AArch64] Peepholes to generate ldp and stp instructions

2013-10-29 Thread Hurugalawadi, Naveen
Hi,

Please consider this as a reminder to review the ldp and stp peephole
implementation for AArch64 target.

The patch was originally posted at:-
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg01051.html

Please review the same and let me know if its okay.
 
Build and tested on aarch64-thunder-elf (using Cavium's internal
simulator). No new regressions.

2013-10-29   Naveen H.S  

gcc/
* config/aarch64/aarch64.md (peephole2s to generate ldp
instruction for 2 consecutive loads from memory): New.
(peephole2s to generate stp instruction for 2 consecutive
stores to memory in integer mode): New.
(peephole2s to generate ldp instruction for 2 consecutive
loads from memory in floating point mode): New.
(peephole2s to generate stp instruction for 2 consecutive
stores to memory in floating point mode): New.

gcc/testsuite
* gcc.target/aarch64/ldp-stp.c: New testcase

Thanks,
Naveendiff -uprN '-x*.orig' mainline-orig/gcc/config/aarch64/aarch64.md gcc-4.8.0/gcc/config/aarch64/aarch64.md
--- mainline-orig/gcc/config/aarch64/aarch64.md	2013-10-28 17:15:52.363975264 +0530
+++ gcc-4.8.0/gcc/config/aarch64/aarch64.md	2013-10-29 10:11:09.656082201 +0530
@@ -1068,6 +1068,27 @@
(set_attr "mode" "")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 "register_operand")
+	(match_operand:GPI 1 "aarch64_mem_pair_operand"))
+   (set (match_operand:GPI 2 "register_operand")
+	(match_operand:GPI 3 "memory_operand"))]
+  "GET_CODE (operands[1]) == MEM
+   && GET_CODE (XEXP (operands[1], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+   && GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+   && REGNO (operands[0]) != REGNO (operands[2])
+   && REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+   && rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+  GET_MODE_SIZE (mode)))
+   && optimize_size"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "store_pair"
@@ -1085,6 +1106,27 @@
(set_attr "mode" "")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 "aarch64_mem_pair_operand")
+	(match_operand:GPI 1 "register_operand"))
+   (set (match_operand:GPI 2 "memory_operand")
+	(match_operand:GPI 3 "register_operand"))]
+  "GET_CODE (operands[0]) == MEM
+   && GET_CODE (XEXP (operands[0], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG
+   && GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT
+   && GET_MODE (operands[1]) == GET_MODE (XEXP (XEXP (operands[0], 0), 0))
+   && REGNO (operands[1]) != REGNO (operands[3])
+   && REGNO_REG_CLASS (REGNO (operands[1]))
+  == REGNO_REG_CLASS (REGNO (operands[3]))
+   && rtx_equal_p (XEXP (operands[2], 0),
+		   plus_constant (Pmode, XEXP (operands[0], 0),
+  GET_MODE_SIZE (mode)))
+   && optimize_size"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 1 and 3 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "load_pair"
@@ -1102,6 +1144,28 @@
(set_attr "mode" "")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 "register_operand")
+	(match_operand:GPF 1 "aarch64_mem_pair_operand"))
+   (set (match_operand:GPF 2 "register_operand")
+	(match_operand:GPF 3 "memory_operand"))]
+  "GET_CODE (operands[1]) == MEM
+   && GET_CODE (XEXP (operands[1], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+   && GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+   && REGNO (operands[0]) != REGNO (operands[2])
+   && REGNO (operands[0]) >= 32 && REGNO (operands[2]) >= 32
+   && REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+   && rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+  GET_MODE_SIZE (mode)))
+   && optimize_size"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "store_pair"
@@ -1119,6 +1183,28 @@
(set_attr "mode" "")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 "aarch64_mem_pair_operand")
+	(match_operand:GPF 1 "register_operand"))
+   (set (match_operand:GPF 2 "memory_operand")
+	(match_operand:GPF 3 "register_operand"))]
+  "GET_CODE (operands[0]) == MEM
+   && GET_CODE (XEXP (operands[0], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG
+   && GET_CODE (XEXP (XE

Re: Aliasing: look through pointer's def stmt

2013-10-29 Thread Richard Biener
On Sat, Oct 26, 2013 at 7:07 PM, Marc Glisse  wrote:
> On Fri, 25 Oct 2013, Marc Glisse wrote:
>
>> On Fri, 25 Oct 2013, Richard Biener wrote:
>>
>>> you can followup with handling POINTER_PLUS_EXPR if you like.
>>
>>
>> Like this? (bootstrap+testsuite on x86_64-unknown-linux-gnu)
>>
>> 2013-10-26  Marc Glisse  
>>
>> gcc/
>> * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a
>> POINTER_PLUS_EXPR in the defining statement.
>
>
> This has some issues with the type of the offsets. First, they might
> overflow in some extreme cases (that could probably already happen without
> the patch). But mostly, negative offsets are not supported. There is a
> comment to that effect before ao_ref_init_from_ptr_and_size, and
> ranges_overlap_p takes the offsets as unsigned HOST_WIDE_INT. Currently it
> does indeed seem hard to produce a negative offset there, but handling
> POINTER_PLUS_EXPR (definitely a good thing) would obviously change that.

For the POINTER_PLUS_EXPR offset argument you should use
int_cst_value () to access it (it will unconditionally sign-extend)
and use host_integerp (..., 0).  That leaves the overflow possibility
in place (and you should multiply by BITS_PER_UNIT) which we
ignore in enough other places similar to this to ignore ...
(side-note: for quite some time on my TODO is to make the
gimple alias-machinery use byte-offsets instead of bit-offsets
which wouldn't cause regressions if we finally lowered bitfield
accesses ...).  A way to not ignore it is to do

  off = double_int_from_tree (ptr-plus-offset);
  off = double_int_sext (off, TYPE_PRECISION (...))
  off = double_int_mul (off, BITS_PER_UNIT);
  if (off.fits_shwi ())
...

Richard.

>
> --
> Marc Glisse


Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c

2013-10-29 Thread Ramana Radhakrishnan

On 10/24/13 00:04, Kugan wrote:

Hi,

arm testcases neon-vcond-ltgt.c and neon-vcond-unordered.c fails in
Linaro 4.8 branch. It is not reproducable with trunk but it can happen.
Both neon-vcond-ltgt.c and neon-vcond-unordered.c scans for vbsl
instruction, with other vector instructions. However, as per the comment
  for "neon_vbsl_internal" md pattern defined in neon.md, gcc can
generate vbsl or vbit or vbif depending on the register allocation.
Therfore, these testcases should scan for one of these three
instructions instead of just vbsl. I have updated the testcases to scan
vbsl or vbit or vbif now.

Is this OK?


This is OK. Thanks for the analysis on the testcase.

regards
Ramana



Thanks,
Kugan

2013-10-23  Kugan Vivekanandarajah  
* gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif.
* gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.







Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c

2013-10-29 Thread Ramana Radhakrishnan

On 10/24/13 00:04, Kugan wrote:

Hi,

arm testcases neon-vcond-ltgt.c and neon-vcond-unordered.c fails in
Linaro 4.8 branch. It is not reproducable with trunk but it can happen.
Both neon-vcond-ltgt.c and neon-vcond-unordered.c scans for vbsl
instruction, with other vector instructions. However, as per the comment
  for "neon_vbsl_internal" md pattern defined in neon.md, gcc can
generate vbsl or vbit or vbif depending on the register allocation.
Therfore, these testcases should scan for one of these three
instructions instead of just vbsl. I have updated the testcases to scan
vbsl or vbit or vbif now.

Is this OK?

Thanks,
Kugan



Hit send too soon on my earlier email. Minor nit in the Changelog as 
below. Need a newline between the DATE line and the file names .




2013-10-23  Kugan Vivekanandarajah  



* gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif.
* gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.





Ramana




Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Jakub Jelinek
On Wed, Oct 16, 2013 at 09:35:21AM +0400, Yury Gribov wrote:
> >>> I've recently submitted a bug report regarding invalid
> unpoisoning of stack frame redzones
> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone
> take a look at proposed patch (a simple one-liner) and check whether
> it's ok for commit?
> >>
> >> Can you please be more verbose
> >
> > Do you think the proposed patch is ok or should I provide some
> additional info?
> 
> Jakub, Dodji,
> 
> Any updates on this one? Note that this bug is a huge blocker for
> using AddressSanitizer on ARM platforms.

Sorry for the delay, I finally found time to look at it.
While your patch fixes the issue, I wonder if for the case where
shadow_mem before the asan_clear_shadow call is already of the form
(mem (reg)) it isn't better to just adjust next asan_clear_shadow
base from the end of the cleared region rather than from the start of it,
because the end will be already in the right pseudo, while with your
approach it needs to be done in a different register and potentially
increase register pressure.  So here is (completely untested) alternate fix:

2013-10-29  Jakub Jelinek  
Yury Gribov  

PR sanitizer/58543
* asan.c (asan_clear_shadow): Return bool whether the emitted loop
(if any) updated shadow_mem's base.
(asan_emit_stack_protection): If asan_clear_shadow returned true,
adjust shadow_mem and prev_offset.

--- gcc/asan.c.jj   2013-10-23 14:43:15.0 +0200
+++ gcc/asan.c  2013-10-29 10:26:56.085406934 +0100
@@ -878,10 +878,11 @@ asan_shadow_cst (unsigned char shadow_by
 /* Clear shadow memory at SHADOW_MEM, LEN bytes.  Can't call a library call 
here
though.  */
 
-static void
+static bool
 asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len)
 {
   rtx insn, insns, top_label, end, addr, tmp, jump;
+  bool ret;
 
   start_sequence ();
   clear_storage (shadow_mem, GEN_INT (len), BLOCK_OP_NORMAL);
@@ -893,12 +894,13 @@ asan_clear_shadow (rtx shadow_mem, HOST_
   if (insn == NULL_RTX)
 {
   emit_insn (insns);
-  return;
+  return false;
 }
 
   gcc_assert ((len & 3) == 0);
   top_label = gen_label_rtx ();
   addr = force_reg (Pmode, XEXP (shadow_mem, 0));
+  ret = addr == XEXP (shadow_mem, 0);
   shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0);
   end = force_reg (Pmode, plus_constant (Pmode, addr, len));
   emit_label (top_label);
@@ -912,6 +914,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_
   jump = get_last_insn ();
   gcc_assert (JUMP_P (jump));
   add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE * 80 / 100);
+  return ret;
 }
 
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
@@ -1048,7 +1051,14 @@ asan_emit_stack_protection (rtx base, HO
   (last_offset - prev_offset)
   >> ASAN_SHADOW_SHIFT);
  prev_offset = last_offset;
- asan_clear_shadow (shadow_mem, last_size >> ASAN_SHADOW_SHIFT);
+ if (asan_clear_shadow (shadow_mem, last_size >> ASAN_SHADOW_SHIFT))
+   {
+ shadow_mem
+   = adjust_automodify_address (shadow_mem, VOIDmode,
+XEXP (shadow_mem, 0),
+last_size >> ASAN_SHADOW_SHIFT);
+ prev_offset += last_size;
+   }
  last_offset = offset;
  last_size = 0;
}


Jakub


[PATCH] Time profiler - phase 1

2013-10-29 Thread Martin Liška
Hello,
   I've cooperating with Jan on a new profile-based function
reordering stuff. This first patch introduces a new GCOV counter that
instruments each function call and stores the time of first run of a
function.

Bootstrapped/regtested on x86_64-linux and i686-linux.

Thanks,
Martin
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fca665b..3b62bcc 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2013-10-29  Martin Liska  
+		Jan Hubicka  
+
+	* cgraph.c (dump_cgraph_node): Profile dump added.
+	* cgraph.h (struct cgraph_node): New time profile variable added.
+	* cgraphclones.c (cgraph_clone_node): Time profile is cloned.
+	* gcov-io.h (gcov_type): New profiler type introduced.
+	* ipa-profile.c (lto_output_node): Streaming for time profile added.
+	(input_node): Time profiler is read from LTO stream.
+	* predict.c (maybe_hot_count_p): Hot prediction changed.
+	* profile.c (instrument_values): New case for time profiler added.
+	(compute_value_histograms): Read of time profile.
+	* tree-pretty-print.c (dump_function_header): Time profiler is dumped.
+	* tree-profile.c (init_ic_make_global_vars): Time profiler function added.
+	(gimple_init_edge_profiler): TP function instrumentation.
+	(gimple_gen_time_profiler): New.
+	* value-prof.c (gimple_add_histogram_value): Support for time profiler
+	added.
+	(dump_histogram_value): TP type added to dumps.
+	(visit_hist): More sensitive check that takes TP into account.
+	(gimple_find_values_to_profile): TP instrumentation.
+	* value-prof.h (hist_type): New histogram type added.
+	(struct histogram_value_t): Pointer to struct function added.
+	* libgcc/Makefile.in: New GCOV merge function for TP added.
+	* libgcov.c: function_counter variable introduced.
+	(_gcov_merge_time_profile): New.
+	(_gcov_time_profiler): New.	
+
 2013-10-29  David Malcolm  
 
 	* doc/gty.texi ("Inheritance and GTY"): Make it clear that
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 52d9ab0..c95a54e 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1890,6 +1890,7 @@ dump_cgraph_node (FILE *f, struct cgraph_node *node)
   if (node->profile_id)
 fprintf (f, "  Profile id: %i\n",
 	 node->profile_id);
+  fprintf (f, "  First run: %i\n", node->tp_first_run);
   fprintf (f, "  Function flags:");
   if (node->count)
 fprintf (f, " executed "HOST_WIDEST_INT_PRINT_DEC"x",
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 7706419..479d49f 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -247,7 +247,6 @@ struct GTY(()) cgraph_clone_info
   bitmap combined_args_to_skip;
 };
 
-
 /* The cgraph data structure.
Each function decl has assigned cgraph_node listing callees and callers.  */
 
@@ -324,6 +323,8 @@ struct GTY(()) cgraph_node {
   unsigned tm_clone : 1;
   /* True if this decl is a dispatcher for function versions.  */
   unsigned dispatcher_function : 1;
+  /* Time profiler: first run of function.  */
+  int tp_first_run;
 };
 
 
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 800dd2c..a05fd77 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -207,6 +207,7 @@ cgraph_clone_node (struct cgraph_node *n, tree decl, gcov_type count, int freq,
   new_node->frequency = n->frequency;
   new_node->clone = n->clone;
   new_node->clone.tree_map = NULL;
+  new_node->tp_first_run = n->tp_first_run;
   if (n->count)
 {
   if (new_node->count > n->count)
diff --git a/gcc/gcov-io.c b/gcc/gcov-io.c
index 5a21c1f..aea272b 100644
--- a/gcc/gcov-io.c
+++ b/gcc/gcov-io.c
@@ -68,7 +68,7 @@ gcov_open (const char *name, int mode)
 #if IN_LIBGCOV
   const int mode = 0;
 #endif
-#if GCOV_LOCKED
+#if GCOV_LOCKED  
   struct flock s_flock;
   int fd;
 
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index ea8d9a7..4650836 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -342,9 +342,10 @@ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
   counter.  */
 #define GCOV_COUNTER_IOR	7  /* IOR of the all values passed to
   counter.  */
-#define GCOV_LAST_VALUE_COUNTER 7  /* The last of counters used for value
+#define GCOV_TIME_PROFILER  8 /* Time profile collecting first run of a function */
+#define GCOV_LAST_VALUE_COUNTER 8  /* The last of counters used for value
   profiling.  */
-#define GCOV_COUNTERS		8
+#define GCOV_COUNTERS		9
 
 /* Number of counters used for value profiling.  */
 #define GCOV_N_VALUE_COUNTERS \
@@ -352,7 +353,7 @@ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
 
   /* A list of human readable names of the counters */
 #define GCOV_COUNTER_NAMES	{"arcs", "interval", "pow2", "single", \
-   "delta", "indirect_call", "average", "ior"}
+   "delta", "indirect_call", "average", "ior", "time_profiler"}
 
   /* Names of merge functions for counters.  */
 #define GCOV_MERGE_FUNCTIONS	{"__gcov_merge_add",	\
@@ -362,7 +363,8 @@ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
  "__gcov_merge_delta",  \
  "__gcov_merge_single", \
  "__gcov_merge_add",	\
- "__gcov_merge_ior"}
+ "__

[PATCH] tree-ssa documetation fix

2013-10-29 Thread Martin Liška
Hello,
  I've noticed that some part of documentation is obsolete and this
patch adds documentation for new functions that replace old macros.

Thanks,
Martin
diff --git a/ChangeLog b/ChangeLog
index 1c50b9b..d64fbcf 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2013-10-29	Martin Liska 
+
+	* doc/tree-ssa.texi: Newly added functions that replace old macros
+	are documented.
+
 2013-10-21  Cong Hou  
 
 	* MAINTAINERS (Write After Approval): Add myself.
diff --git a/gcc/doc/tree-ssa.texi b/gcc/doc/tree-ssa.texi
index aaf741b..93f596d 100644
--- a/gcc/doc/tree-ssa.texi
+++ b/gcc/doc/tree-ssa.texi
@@ -526,31 +526,29 @@ the result of ``merging'' @code{a_1}, @code{a_2} and @code{a_3}.
 Hence, PHI nodes mean ``one of these operands.  I don't know
 which''.
 
-The following macros can be used to examine PHI nodes
+The following functions can be used to examine PHI nodes
 
-@defmac PHI_RESULT (@var{phi})
+@defun gimple_phi_result (@var{phi})
 Returns the @code{SSA_NAME} created by PHI node @var{phi} (i.e.,
 @var{phi}'s LHS)@.
-@end defmac
+@end defun
 
-@defmac PHI_NUM_ARGS (@var{phi})
+@defun gimple_phi_num_args (@var{phi})
 Returns the number of arguments in @var{phi}.  This number is exactly
 the number of incoming edges to the basic block holding @var{phi}@.
-@end defmac
+@end defun
 
-@defmac PHI_ARG_ELT (@var{phi}, @var{i})
-Returns a tuple representing the @var{i}th argument of @var{phi}@.
-Each element of this tuple contains an @code{SSA_NAME} @var{var} and
-the incoming edge through which @var{var} flows.
-@end defmac
+@defun gimple_phi_arg (@var{phi}, @var{i})
+Returns @var{i}th argument of @var{phi}@.
+@end defun
 
-@defmac PHI_ARG_EDGE (@var{phi}, @var{i})
+@defun gimple_phi_arg_edge (@var{phi}, @var{i})
 Returns the incoming edge for the @var{i}th argument of @var{phi}.
-@end defmac
+@end defun
 
-@defmac PHI_ARG_DEF (@var{phi}, @var{i})
+@defun gimple_phi_arg_def (@var{phi}, @var{i})
 Returns the @code{SSA_NAME} for the @var{i}th argument of @var{phi}.
-@end defmac
+@end defun
 
 
 @subsection Preserving the SSA form


Re: [PING] [AArch64] Peepholes to generate ldp and stp instructions

2013-10-29 Thread Ramana Radhakrishnan



You are better off CCing the maintainers for such reviews. Let me do 
that for you. I cannot approve or reject this patch but I have a few 
comments as below.


On 10/29/13 09:22, Hurugalawadi, Naveen wrote:


diff -uprN '-x*.orig' mainline-orig/gcc/config/aarch64/aarch64.md 
gcc-4.8.0/gcc/config/aarch64/aarch64.md
--- mainline-orig/gcc/config/aarch64/aarch64.md 2013-10-28 17:15:52.363975264 
+0530
+++ gcc-4.8.0/gcc/config/aarch64/aarch64.md 2013-10-29 10:11:09.656082201 
+0530
@@ -1068,6 +1068,27 @@
 (set_attr "mode" "")]
  )

+(define_peephole2
+  [(set (match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "aarch64_mem_pair_operand"))
+   (set (match_operand:GPI 2 "register_operand")
+   (match_operand:GPI 3 "memory_operand"))]
+  "GET_CODE (operands[1]) == MEM
+   && GET_CODE (XEXP (operands[1], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG


Use the predicates REG_P , CONST_INT_P and so on. Don't do this in this 
form.



+   && GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+   && GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+   && REGNO (operands[0]) != REGNO (operands[2])
+   && REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+   && rtx_equal_p (XEXP (operands[3], 0),
+  plus_constant (Pmode, XEXP (operands[1], 0),
+ GET_MODE_SIZE (mode)))
+   && optimize_size"


Why use optimize_size here ? I would have thought it was always good to 
generate ldp and stp instructions. If you want to make it a core 
specific decision make it a tuning decision but don't disable just based 
on size.


Additionally if this check is common between all the patterns it would 
be better to factor this out into a predicate or a function in its own 
right.



+  [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+)
+
  ;; Operands 0 and 2 are tied together by the final condition; so we allow
  ;; fairly lax checking on the second memory operation.
  (define_insn "store_pair"
@@ -1085,6 +1106,27 @@
 (set_attr "mode" "")]
  )

+(define_peephole2
+  [(set (match_operand:GPI 0 "aarch64_mem_pair_operand")
+   (match_operand:GPI 1 "register_operand"))
+   (set (match_operand:GPI 2 "memory_operand")
+   (match_operand:GPI 3 "register_operand"))]
+  "GET_CODE (operands[0]) == MEM
+   && GET_CODE (XEXP (operands[0], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG
+   && GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT
+   && GET_MODE (operands[1]) == GET_MODE (XEXP (XEXP (operands[0], 0), 0))
+   && REGNO (operands[1]) != REGNO (operands[3])
+   && REGNO_REG_CLASS (REGNO (operands[1]))
+  == REGNO_REG_CLASS (REGNO (operands[3]))
+   && rtx_equal_p (XEXP (operands[2], 0),
+  plus_constant (Pmode, XEXP (operands[0], 0),
+ GET_MODE_SIZE (mode)))
+   && optimize_size"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+)
+
  ;; Operands 1 and 3 are tied together by the final condition; so we allow
  ;; fairly lax checking on the second memory operation.
  (define_insn "load_pair"
@@ -1102,6 +1144,28 @@
 (set_attr "mode" "")]
  )

+(define_peephole2
+  [(set (match_operand:GPF 0 "register_operand")
+   (match_operand:GPF 1 "aarch64_mem_pair_operand"))
+   (set (match_operand:GPF 2 "register_operand")
+   (match_operand:GPF 3 "memory_operand"))]
+  "GET_CODE (operands[1]) == MEM
+   && GET_CODE (XEXP (operands[1], 0)) == PLUS
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
+   && GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+   && GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+   && REGNO (operands[0]) != REGNO (operands[2])
+   && REGNO (operands[0]) >= 32 && REGNO (operands[2]) >= 32


I'm not sure what this means here but this check doesn't look right - 32 
is V0_REGNUM - why are you checking that here ? If you really need to do 
that check then use V0_REGNUM or it may be better to see if this really 
is a SIMD regnum ? Can you not use something like FP_REGNUM_P here 
instead ?




+   && REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+   && rtx_equal_p (XEXP (operands[3], 0),
+  plus_constant (Pmode, XEXP (operands[1], 0),
+ GET_MODE_SIZE (mode)))
+   && optimize_size"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+)
+
  ;; Operands 0 and 2 are tied together by the final condition; so we allow
  ;; fairly lax checking on the second memory operation.
  (define_insn "store_pair"
@@ -1119,6 +1183,28 @@
 (set_attr "mode" "")]
  )

+(define_peephole2
+  [(set (match_operand:GPF 0 "aarch64_mem_pair_operand")
+   (match_operand:GPF 1 "register_operand"))
+   (set (match_operand:GPF 2 "memo

Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.

2013-10-29 Thread Kirill Yukhin
Hello Richard,

On 28 Oct 14:45, Richard Henderson wrote:
> On 10/28/2013 01:58 PM, Kirill Yukhin wrote:
> > Hello Richard,
> > On 28 Oct 08:20, Richard Henderson wrote:
> >> Why is a masked *scalar* operation useful?
> > 
> > The reason the instructions exist is so that
> > you can do fully fault correct predicated scalar algorithms.
> 
> Using VEC_MERGE isn't the proper representation for that.
> 
> If that's your real goal, then COND_EXEC is the only way to let
> rtl know that faults are suppressed in the false condition.

I believe cond_exec approach supposed to look like this:
  (define_subst "mask_scalar"
[(set (match_operand:SUBST_V 0)
  (vec_merge:SUBST_V
(match_operand:SUBST_V 1)
(match_operand:SUBST_V 2)
(const_int 1)))]
"TARGET_AVX512F"
[(cond_exec (eq:CC
  (match_operand: 3 "register_operand" "k")
  (const_int 1))
(set (match_dup 0)
 (vec_merge:SUBST_V
   (match_dup 1)
   (match_dup 2)
   (const_int 1])

But this only will describe merge-masking in incorrect way.
We will need to add a clobber to signal that even for false
condition we will zero higher part of register.
Preferable zerro-masking will be indistinguishable from merge-
masking and will need to choose which mask mode to enable. Bad turn.

IMHO, we have 3 options to implement scalar masked insns:
  1. `vec_merge' over vec_merge (current approach).
 Pro.
   1. Precise semantic description
   2. Unified approach with vector patterns
   3. Freedom for simplifier to reduce EVEX to VEX for
   certain const masks
 Cons.
   1. Too precise semantic description and as a
   consequence complicated code in md-file

  2. `cond_exec' approach
Pro.
  1. Look useful for compiler when trying to generate
  predicated code
Cons.
  1. Not precise. Extra clobbers (?) needed: to signal
  that we're changing the register even for false
  condition in cond_exec 
  2. Unable to describe zero masking nicely
  3. Code still complicated as for option #1
  4. Simplifier won't work (clobber is always clobber)

  3. Make all masked scalar insns to be unspecs
Pro.
  1. Straight-forward, not overweighted. Enough for
  intrinsics to work
Cons.
  1. Since every unspec needs a code: substs won't be
  applied directly: huge volume of similar code
  2. Simplifier won't work
  3. Generation of predicated code become hard

Am I missing  some options, or that’s all we have?
If so, what option would you prefer?

Thanks, K


Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins

2013-10-29 Thread Richard Biener
On Mon, Oct 28, 2013 at 10:21 PM, Jeff Law  wrote:
> On 10/25/13 11:57, Ilya Enkovich wrote:
>
>>
>> There are currently two known issues with LTO. The first one is ICE in
>> LTO streamer when it reads instrumented code. The second one is
>> unitialized flag_check_pointers when code is compiled by lto1 (do you
>> know why it may happen BTW?). It also causes ICE beacause instrumented
>> code is met when not expected. Of course, I'll fix these problems
>> anyway, but I was going to allow '-fcheck-pointers -flto' only when
>> checker testsuite has 100% pass rate with lto.
>
> No idea about why LTO would be failing in this way -- my experience with LTO
> is virtually nil.
>
> I'm OK with the restriction as a temporary measure.  But we really dislike
> this kidn of stuff, so I need a commitment from you to dive into these
> problems and sort out what's really going on.

I'm not ok with adding such arbitrary restrictions.  LTO support should
be trivial.

Richard.

>
>>
>> Ilya
>>
>


Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS

2013-10-29 Thread Richard Biener
On Mon, Oct 28, 2013 at 9:16 PM, Mike Stump  wrote:
> Ping?

Can you please get rid of PARTIAL_INT_MODE_NAME by
making the name a required argument to PARTIAL_INT_MODE?

Ok with that change (given target maintainers are ok with this).

Thanks,
Richard.

> On Oct 18, 2013, at 10:03 AM, Mike Stump  wrote:
>> On Oct 18, 2013, at 4:24 AM, Richard Biener  
>> wrote:
>>> I agree.  Btw, the "implementation defined" precision of PDI on SH-5 
>>> definitely
>>> looks interesting, but I wonder how you can perform "implementation defined"
>>> arithmetic on that PDI mode then ;)
>>
>> Easy, perform it in the maximal size supported…
>>
>>> What else blocks this patch?
>>
>> Nothing.  We have heard from everyone, and all the issues found by testing 
>> were on msp, and those were exclusively due to port issues that I believe 
>> now have been resolved.  The below patch has the change to 22 bits for PSI 
>> on SH.
>>
>> Ok?
>>
>


Re: [PATCH, ARM] Fix line number data for PIC register setup code

2013-10-29 Thread Tom de Vries
On 29/10/13 09:12, Eric Botcazou wrote:
>> Updated patch, re-bootstrapped on x86_64 and committed to trunk.
>>
>> Also applied to 4.7 and 4.8 branches, the same problem is present there.
> 
> You only asked for approval on trunk though,

Eric,

Sorry about that.

> and I'm not sure we really care
> about the results of the GDB testsuite on the 4.7 branch at this point,

FWIW, I originally encountered the problem on 4.7.

> so let's exclude the 4.7 branch here.
> 

Reverted on the 4.7 branch.

Thanks,
- Tom



Re: [RFC] [Testsuite,ARM] Neon intrinsics executable tests

2013-10-29 Thread Ramana Radhakrishnan

On 10/09/13 23:16, Christophe Lyon wrote:

Hi,

This patch is a first small sample of dejagnu-ization of my ARM Neon
intrinsics tests.


Thanks for attempting this and apologies for the slow response - I've 
been busy with a few other things internally.




It's derived from my previous work at
http://gitorious.org/arm-neon-tests/arm-neon-tests  which supports all
the ARM intrinsics, with executable tests. As I have to manually
transform each test (to include the expected data, and a few other
modifications), it's quite a bit tedious.


I'll take your word that this is tedious :) I can see how you get the 
reference input in from the original text file into headers and having 
to marshall things there.


Irrespective of our earlier conversations on this now I'm actually 
wondering if instead of doing this and integrating this in the GCC 
source base it maybe easier to write a harness to test this cross on 
qemu or natively. Additionally setting up an auto-tester to do this 
might be a more productive use of time rather than manually dejagnuizing 
this which appears to be a tedious and slow process.




I'd like your feedback before continuing, as there are a lot more
files to come.

I have made some cleanup to help review, but the two .h files will
need to grow as more intrinsics will be added (see the original ones).


Which one should I compare this with in terms of the original file ?



I'd like to keep the modifications at a minimal level, to save my time
when adapting each test (there are currently 145 test files, so 143
left:-).


On to the patch itself.

The prefix TEST_ seems a bit misleading in that it suggests this is 
testing something when in reality this is initializing stuff.





Thanks,

Christophe.

This patch only introduces new files.
2013-10-03  Christophe Lyon

testsuite/gcc.target/arm/neon-intrinsics/
* neon-intrinsics.exp: New driver file.
* arm-neon-ref.h: New file, with common vector construction
helpers.
* compute_ref_data.h: New file, with helpers for input data
initialization.
* ref_vaba.c: New test file for the vaba family of intrinsics.
* ref_vld1.c: New test file for vld1.


neontests.patch.txt


diff -rNup '--exclude=.git' 
gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp 
gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp
--- gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp
1970-01-01 01:00:00.0 +0100
+++ 
gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp
  2013-05-08 23:08:46.271786347 +0200
@@ -0,0 +1,35 @@
+# Copyright (C) 1997-2013 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+#.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an ARM target.
+if ![istarget arm*-*-*] then {
+  return
+}


Also for aarch64*-*-* as all these intrinsics are compatible with the 
aarch64 port. I would also prefer that this be tortured over multiple 
optimization levels as many times we find issues with different 
optimization levels.


More later I need to get back to something else and I need to play more 
with your original testsuite - but I'd like some discussion around some 
of these points anyway.


Ramana


+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+   ""
+
+# All done.
+dg-finish
diff -rNup '--exclude=.git' 
gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h 
gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
--- gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h 
1970-01-01 01:00:00.0 +0100
+++ 
gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h   
2013-05-09 00:48:59.395628726 +0200
@@ -0,0 +1,349 @@
+#ifndef_ARM_NEON_REF_H_
+#define_ARM_NEON_REF_H_
+
+#include 
+#include 
+#include 
+#include 
+
+#define xSTR(X) #X
+#define STR(X) xSTR(X)
+
+#define xNAME1(V,T) V ## _ ##  T
+#define xNAME(V,T) xNAME1(V,T)
+
+#define VAR(V,T,W) xNAME(V,T##W)
+#define VAR_DECL(V, T, W) T##W##_t VAR(V,T,W)
+
+#define VECT_NAME(T, W, N) T##W##x##N
+#define VECT_ARRAY_NAME(T, W, N, L) T##W##x##N##x##L
+#define VECT_TYPE(T, W, N) xNAME(VECT_NAME(T,W,N),t)
+#define VECT_ARRAY_TYPE(T, W, N, 

Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-29 Thread Richard Biener
On Sun, Oct 27, 2013 at 7:55 PM, Andrew Pinski  wrote:
> On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski  wrote:
>> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski  wrote:
>>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen
>>>  wrote:
 On 18 October 2013 00:58, Jeff Law  wrote:
> On 10/17/13 05:03, Richard Biener wrote:

 Is it OK for trunk?
>>>
>>>
>>> I had a much simpler change which did basically the same from 4.7 (I
>>> can update it if people think this is a better approach).
>>
>>
>> I like that more (note you can now use is_gimple_condexpr as predicate
>> for force_gimple_operand).
>
> The obvious question is whether or not Andrew's simpler change picks up as
> many transformations as Zhenqiang's change.  If not are the things missed
> important.
>
> Zhenqiang, can you do some testing of your change vs Andrew P.'s change?

 Here is a rough compare:

 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
 in my patch). Root cause is that it does not skip "LABEL". The guard
 to do this opt should be the same the bb_has_overhead_p in my patch.
>>>
>>> This should be an easy change, I am working on this right now.
>>>

 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
 efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate

   _3 = a_2(D) > 0;
   _5 = b_4(D) > 0;
   _6 = _3 | _5;
   _9 = c_7(D) <= 0;
   _10 = ~_6;
   _11 = _9 & _10;
   if (_11 == 0)

 With my patch, it will generate

   _3 = a_2(D) > 0;
   _5 = b_4(D) > 0;
   _6 = _3 | _5;
   _9 = c_7(D) > 0;
   _10 = _6 | _9;
   if (_10 != 0)
>>>
>>> As mentioned otherwise, this seems like a missed optimization inside
>>> forwprop.  When I originally wrote this code there used to be two
>>> cases one for & and one for |, but this was removed sometime and I
>>> just made the code evolve with that.
>>
>> Actually I can make a small patch (3 lines) to my current patch which
>> causes tree-ssa-ifcombine.c to produce the behavior of your patch.
>>
>>  if (result_inv)
>>{
>>  t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
>>  result_inv = false;
>>}
>>
>> I will submit a new patch after some testing of my current patch which
>> fixes items 1 and 2.
>
>
> Here is my latest patch which adds the testcases from Zhenqiang's
> patch and fixes item 1 and 2.
>
> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

   return i;
 }
+/* Return a new iterator pointing to the first non-debug non-label statement in
+   basic block BB.  */
+
+static inline gimple_stmt_iterator
+gsi_start_nondebug_after_labels_bb (basic_block bb)


vertical space before the comment missing.

@@ -631,7 +669,7 @@ tree_ssa_ifcombine (void)
   bbs = single_pred_before_succ_order ();
   calculate_dominance_info (CDI_DOMINATORS);

-  for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i)
+  for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i >= 0; i--)
 {
   basic_block bb = bbs[i];
   gimple stmt = last_stmt (bb);

The original code matches what phiopt does which even has a comment:

  /* Search every basic block for COND_EXPR we may be able to optimize.

 We walk the blocks in order that guarantees that a block with
 a single predecessor is processed before the predecessor.
 This ensures that we collapse inner ifs before visiting the
 outer ones, and also that we do not try to visit a removed
 block.  */
  bb_order = single_pred_before_succ_order ();
  n = n_basic_blocks - NUM_FIXED_BLOCKS;

  for (i = 0; i < n; i++)

so if the reverse order is better here (and in phiopt?) then it at least
needs a comment explaining why.

Otherwise the patch looks good, but please iterate with Jeff over
the dom testcase issue.

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
> (ifcombine_ifandif): Handle cases where
> maybe_fold_and_comparisons fails, combining the branches
> anyways.
> (tree_ssa_ifcombine): Inverse the order of
> the basic block walk, increases the number of combinings.
> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>
>
> testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>
> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
> conditional move to be used.
> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
> intermediate".
>
>
>
>
>>
>> Thanks,
>> Andrew Pinski
>>
>>
>>>

 3) The good thing of Andrew P.'s change is that "Inverse the order of
 the basic bloc

[PATCH v4, nds32] Andes nds32 port, libgcc part

2013-10-29 Thread Chung-Ju Wu
Hi, all,

This is v4 patch for Andes nds32 port on libgcc part.

Thanks Joseph Myers's comments.  Now we remove the .file directive
from crtzero.S.  The rationale of this can be referred to:
  http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00123.html

The diff is as follows and the patch file is attached.

diff --git a/libgcc/config/nds32/crtzero.S b/libgcc/config/nds32/crtzero.S
index 5ef0361..6509869 100644
--- a/libgcc/config/nds32/crtzero.S
+++ b/libgcc/config/nds32/crtzero.S
@@ -33,8 +33,6 @@
 !!
 
!!==

-   .file   "crtzero.S"
-
 
!!--
 !! Jump to start up code
 
!!--


libgcc/
2013-10-29  Chung-Ju Wu  
Shiva Chen  

* config/nds32 : New directory and files.


Best regards,
jasonwucj


3-nds32-libgcc.v4.patch.tar.gz
Description: GNU Zip compressed data


Re: free is a killer

2013-10-29 Thread Richard Biener
On Mon, Oct 28, 2013 at 11:05 PM, Marc Glisse  wrote:
> On Mon, 28 Oct 2013, Jeff Law wrote:
>
>> On 10/26/13 01:15, Marc Glisse wrote:
>>>
>>> Hello,
>>>
>>> this patch teaches gcc that free kills the memory its argument points
>>> to. The equality test is probably too strict, I guess we can loosen it
>>> later (unless you have suggestions?).
>>>
>>> Note that the corresponding code for BUILT_IN_MEMCPY and others seems
>>> suspicious to me, it looks like it is testing for equality between a
>>> pointer and a mem_ref, which is unlikely to happen.
>>>
>>> Bootstrap+testsuite on x86_64-unknown-linux-gnu.
>>>
>>> 2013-10-27  Marc Glisse  
>>>
>>>  PR tree-optimization/19831
>>> gcc/
>>>  * tree-ssa-alias.c (stmt_kills_ref_p_1): Handle BUILT_IN_FREE.
>>>
>>> gcc/testsuite/
>>>  * gcc.dg/tree-ssa/alias-25.c: New file.
>>
>> OK for the trunk.
>
>
> Thanks.
>
>
>> I agree the MEM_REF* and VA_END cases look strange and I think they're
>> wrong as well.  Did you happen to try fixing them and running the testsuite
>> to see if anything fails?
>>
>> ISTM your testcase could be tweaked to test conclusively if they're doing
>> the right thing -- and regardless of what's found that test can/should make
>> its way into the testsuite.

The VA_END case looks ok (though == equality testing is a bit
too conservative, the SSA_NAME case of the builtins handling
looks wrong indeed.

> I checked and it does the wrong thing (I don't have the testcase handy
> anymore, but it shouldn't be hard to recreate one), I even wrote a patch
> (attached) but it is related to:
> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html
> so it can't go in. A more limited fix (still missing many cases) would be
> possible. I didn't test VA_END, but it doesn't look as bad (it is the
> SSA_NAME case that is wrong for memcpy).
>
>
> While I am posting non-submitted patches, does the following vaguely make
> sense? I couldn't find a testcase that exercised it without some local
> patches, but the idea (IIRC) is that with:
> struct A { struct B b; int i; }
> we can easily end up with one ao_ref.base that is a MEM_REF[p] and another
> one a MEM_REF[(B*)q] where p and q are of type A*, and that prevents us from
> noticing that p->i and q->b don't alias. Maybe I should instead find a way
> to get MEM_REF[q] as ao_ref.base, but if q actually points to a larger
> structure that starts with A it is likely to fail as well...
>
> --- gcc/tree-ssa-alias.c(revision 204095)
> +++ gcc/tree-ssa-alias.c(working copy)
> @@ -1099,22 +1099,24 @@ indirect_refs_may_alias_p (tree ref1 ATT
>
>/* If both references are through the same type, they do not alias
>   if the accesses do not overlap.  This does extra disambiguation
>   for mixed/pointer accesses but requires strict aliasing.  */
>if ((TREE_CODE (base1) != TARGET_MEM_REF
> || (!TMR_INDEX (base1) && !TMR_INDEX2 (base1)))
>&& (TREE_CODE (base2) != TARGET_MEM_REF
>   || (!TMR_INDEX (base2) && !TMR_INDEX2 (base2)))
>&& same_type_for_tbaa (TREE_TYPE (base1), TREE_TYPE (ptrtype1)) == 1
>&& same_type_for_tbaa (TREE_TYPE (base2), TREE_TYPE (ptrtype2)) == 1
> -  && same_type_for_tbaa (TREE_TYPE (ptrtype1),
> -TREE_TYPE (ptrtype2)) == 1)
> +  && (same_type_for_tbaa (TREE_TYPE (ptrtype1),
> + TREE_TYPE (ptrtype2)) == 1
> + || same_type_for_tbaa (TREE_TYPE (TREE_TYPE (ptr1)),
> +TREE_TYPE (TREE_TYPE (ptr2))) == 1))

No, the type of 'ptr' are not in any way relevant.

Richard.

>  return ranges_overlap_p (offset1, max_size1, offset2, max_size2);
>
>/* Do type-based disambiguation.  */
>if (base1_alias_set != base2_alias_set
>&& !alias_sets_conflict_p (base1_alias_set, base2_alias_set))
>  return false;
>
>/* Do access-path based disambiguation.  */
>if (ref1 && ref2
>&& (handled_component_p (ref1) || handled_component_p (ref2))
>
>
>
> In the category "ugly hack that I won't submit", let me also attach this
> patch that sometimes helps notice that malloc doesn't alias other things (I
> don't know if the first hunk ever fires).
>
> --
> Marc Glisse


[PATCH v4, nds32] Andes nds32 port, documentation part

2013-10-29 Thread Chung-Ju Wu
This is v4 patch for Andes nds32 port on documentation part.

Thanks Joseph Myers's comments.  Now we use @deftypefn and @samp
to refine our documentation content.

The diff is as follows and the patch file is attached.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f0002a9..5f91813 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12432,33 +12432,38 @@ once the handler returns.

 These built-in functions are available for the NDS32 target:

-@table @code
-@item void __builtin_nds32_isync (int *@var{addr})
+@deftypefn {Built-in Function} void __builtin_nds32_isync (int *@var{addr})
 Insert an ISYNC instruction into the instruction stream where
 @var{addr} is an instruction address for serialization.
+@end deftypefn

-@item void __builtin_nds32_isb (void)
+@deftypefn {Built-in Function} void __builtin_nds32_isb (void)
 Insert an ISB instruction into the instruction stream.
+@end deftypefn

-@item int __builtin_nds32_mfsr (int @var{sr})
+@deftypefn {Built-in Function} int __builtin_nds32_mfsr (int @var{sr})
 Return the content of a system register which is mapped by @var{sr}.
+@end deftypefn

-@item int __builtin_nds32_mfusr (int @var{usr})
+@deftypefn {Built-in Function} int __builtin_nds32_mfusr (int @var{usr})
 Return the content of a user space register which is mapped by @var{usr}.
+@end deftypefn

-@item void __builtin_nds32_mtsr (int @var{value}, int @var{sr})
+@deftypefn {Built-in Function} void __builtin_nds32_mtsr (int
@var{value}, int @var{sr})
 Move the @var{value} to a system register which is mapped by @var{sr}.
+@end deftypefn
-@item void __builtin_nds32_mtusr (int @var{value}, int @var{usr})
+@deftypefn {Built-in Function} void __builtin_nds32_mtusr (int
@var{value}, int @var{usr})
 Move the @var{value} to a user space register which is mapped by @var{usr}.
+@end deftypefn

-@item void __builtin_nds32_setgie_en (void)
+@deftypefn {Built-in Function} void __builtin_nds32_setgie_en (void)
 Enable global interrupt.
+@end deftypefn

-@item void __builtin_nds32_setgie_dis (void)
+@deftypefn {Built-in Function} void __builtin_nds32_setgie_dis (void)
 Disable global interrupt.
-
-@end table
+@end deftypefn

 @node picoChip Built-in Functions
 @subsection picoChip Built-in Functions
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 68af98e..cbc0644 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1864,7 +1864,7 @@ supported since version 4.7.2 and is the default
in 4.8.0 and newer.

 @item --with-nds32-lib=@var{library}
 Specifies that @var{library} setting is used for building @file{libgcc.a}.
-Currently, the valid @var{library} is 'newlib' or 'mculib'.
+Currently, the valid @var{library} is @samp{newlib} or @samp{mculib}.
 This option is only supported for the NDS32 target.

 @item --with-build-time-tools=@var{dir}


gcc/
2013-10-29  Chung-Ju Wu  
Shiva Chen  

* doc/invoke.texi (NDS32 options): Document nds32 specific options.
* doc/md.texi (NDS32 family): Document nds32 specific constraints.
* doc/install.texi (Cross-Compiler-Specific Options): Document
--with-nds32-lib for nds32 target.
* doc/extend.texi (Function Attributes): Document nds32 specific
attributes.


Best regards,
jasonwucj


5-nds32-documentation.v4.patch
Description: Binary data


Re: free is a killer

2013-10-29 Thread Richard Biener
On Tue, Oct 29, 2013 at 6:35 AM, Jeff Law  wrote:
> On 10/28/13 16:05, Marc Glisse wrote:
>>
>>
>> I checked and it does the wrong thing (I don't have the testcase handy
>> anymore, but it shouldn't be hard to recreate one), I even wrote a patch
>> (attached) but it is related to:
>> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html
>> so it can't go in. A more limited fix (still missing many cases) would
>> be possible. I didn't test VA_END, but it doesn't look as bad (it is the
>> SSA_NAME case that is wrong for memcpy
>
> Actually, it's fairly easy to see with memset.  Something like this:
>
> /* { dg-do compile } */
> /* { dg-options "-O1 -fdump-tree-optimized" } */
>
> void f (long *p) {
>   *p = 42;
>   p[4] = 42;
>   __builtin_memset (p, 0, 100);
> }
>
> /* { dg-final { scan-tree-dump-not "= 42" "optimized" } } */
> /* { dg-final { cleanup-tree-dump "optimized" } } */
>
>
> I'm sure it can be made to work with memcpy as well, it's just harder
> because you have to ensure the source/dest args don't alias.
>
>
>
>>
>>
>> While I am posting non-submitted patches, does the following vaguely
>> make sense? I couldn't find a testcase that exercised it without some
>> local patches, but the idea (IIRC) is that with:
>> struct A { struct B b; int i; }
>> we can easily end up with one ao_ref.base that is a MEM_REF[p] and
>> another one a MEM_REF[(B*)q] where p and q are of type A*, and that
>> prevents us from noticing that p->i and q->b don't alias. Maybe I should
>> instead find a way to get MEM_REF[q] as ao_ref.base, but if q actually
>> points to a larger structure that starts with A it is likely to fail as
>> well...
>
> I've never looked at the alias oracle stuff, but ISTM that offsets from the
> base ought to be enough to disambiguate?
>
>
>>
>>
>>
>> In the category "ugly hack that I won't submit", let me also attach this
>> patch that sometimes helps notice that malloc doesn't alias other things
>> (I don't know if the first hunk ever fires).
>
> Well, one thing that can be done here (I think it comes from Steensgaard and
> I'm pretty sure it's discussed in Morgan's text) is you assign a storage
> class and propagate that.   Once you do so, you can eliminate certain
> things.  You use a vrp-like engine with a fairly simple meet operator to
> handle propagation of the storage class.
>
> Once you have storage classes for all the pointers, you can make certain
> deductions about aliasing relationships.  Including what things in the
> malloc storage class can and can not alias.

Of course in the example we have the "global memory" storage class
(incoming function argument) and "malloc memory" which is really
the same storage class.  It only becomes a different storage class
if you factor in flow analysis (for which the current PTA code is weak
once you leave the SSA web).

Richard.

> Jeff
>
>


Re: RFA: Andes nds32 port v4 patch

2013-10-29 Thread Chung-Ju Wu
2013/10/29 Joseph S. Myers :
> Since Richard looked at the machine description part of the port, I looked
> at the other pieces.  I'm happy with those pieces with the following minor
> changes:
>
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00520.html (libgcc): remove
> the .file directive from crtzero.S.  See
>  for why such
> directives are best avoided.
>

Thank you for the link.  Now I realize the rationale of
why it is better to remove the .file directive. :)

Modify it accordingly:
  http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02382.html

> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00522.html (documentation):
> use @deftypefn rather than @table @code when documenting built-in
> functions.  In install.texi, say @samp{newlib} and @samp{mculib} instead
> of using '' quotes.
>

Thanks for the comment.  Now I modify it as you suggested:
  http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02384.html

> --
> Joseph S. Myers
> jos...@codesourcery.com

Thank you very much for the comments on libgcc and documentation parts.
Here I re-list the v4 patch of nds32 port:

  Machine Description --
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02153.html
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02154.html
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02156.html
  Libgcc and Documentation --
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02382.html
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02384.html

Are those nds32 port patches OK to be committed?


Best regards,
jasonwucj


Re: [C++ Patch] PR 58888

2013-10-29 Thread Paolo Carlini

Hi,

On 10/28/2013 07:05 PM, Jason Merrill wrote:
My question is, why do we need that whole block for massaging VAR_DECL 
initializers? That all ought to be handled properly by cp_finish_decl. 
Does removing everything after



else if (!VAR_P (value))
gcc_unreachable ();

work?
In fact, it works great, excellent. I only have to tweak a bit 
g++.dg/other/warning1.C, which, for some reason, used dg-warnings, 
whereas in fact we have been matching errors (we are removing the "field 
initializer is not constant" error and replacing it with "(1.0e+0f / 
0.0f)’ is not a constant expression" etc)


Tested x86_64-linux.

Thanks,
Paolo.

/
Index: cp/decl2.c
===
--- cp/decl2.c  (revision 204152)
+++ cp/decl2.c  (working copy)
@@ -955,26 +955,6 @@ grokfield (const cp_declarator *declarator,
/* C++11 NSDMI, keep going.  */;
   else if (!VAR_P (value))
gcc_unreachable ();
-  else if (!processing_template_decl)
-   {
- if (TREE_CODE (init) == CONSTRUCTOR)
-   init = digest_init (TREE_TYPE (value), init, tf_warning_or_error);
- init = maybe_constant_init (init);
-
- if (init != error_mark_node && !TREE_CONSTANT (init))
-   {
- /* We can allow references to things that are effectively
-static, since references are initialized with the
-address.  */
- if (TREE_CODE (TREE_TYPE (value)) != REFERENCE_TYPE
- || (TREE_STATIC (init) == 0
- && (!DECL_P (init) || DECL_EXTERNAL (init) == 0)))
-   {
- error ("field initializer is not constant");
- init = error_mark_node;
-   }
-   }
-   }
 }
 
   if (processing_template_decl && VAR_OR_FUNCTION_DECL_P (value))
Index: testsuite/g++.dg/cpp0x/auto40.C
===
--- testsuite/g++.dg/cpp0x/auto40.C (revision 0)
+++ testsuite/g++.dg/cpp0x/auto40.C (working copy)
@@ -0,0 +1,11 @@
+// PR c++/5
+// { dg-do compile { target c++11 } }
+
+#include 
+
+struct A
+{
+  static constexpr auto b{1.0};
+};
+
+constexpr decltype(A::b) A::b;
Index: testsuite/g++.dg/other/warning1.C
===
--- testsuite/g++.dg/other/warning1.C   (revision 204152)
+++ testsuite/g++.dg/other/warning1.C   (working copy)
@@ -7,8 +7,8 @@ extern "C" int printf(const char *, ...);
 
 struct S
 {
-  static const float inf = 1.0f / 0.0f; // { dg-warning "1.0|float|initializ" }
-  static const float nan = 0.0f / 0.0f; // { dg-warning "0.0|float|initializ" }
+  static const float inf = 1.0f / 0.0f; // { dg-error "1.0|float|initializ" }
+  static const float nan = 0.0f / 0.0f; // { dg-error "0.0|float|initializ" }
 };
 
 int main()


Re: [PATCH, ARM] Fix line number data for PIC register setup code

2013-10-29 Thread Eric Botcazou
> Reverted on the 4.7 branch.

Thanks.

-- 
Eric Botcazou


Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP

2013-10-29 Thread Richard Biener
On Fri, 25 Oct 2013, Jakub Jelinek wrote:

> Hi!
> 
> As discussed on IRC, this patch attempts to preserve VRP computed
> range info for some simple __builtin_unreachable () using assertions.
> If there are no immediate uses of some SSA_NAME except for those in
> a condition guarding __builtin_unreachable () and in ASSERT_EXPR
> in the following basic block, we can copy the range info from the assert
> lhs (which is lost during remove_range_assertions) and thus preserve it.
> Eventually we might then remove the __builtin_unreachable () in that case
> and still benefit from user annotating the source.
> In the testcase below in bar we have even without this patch the expected
> range info on the SSA_NAME set to x + 1, but in foo there was no range
> info preserved.  As we have just one user of get_range_info right now,
> not creating a testcase.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> int
> foo (int x)
> {
>   if (x < 26 || x > 37)
> __builtin_unreachable ();
>   return x;
> }
> 
> int
> bar (int x)
> {
>   if (x < 26 || x > 37)
> __builtin_unreachable ();
>   return x + 1;
> }
> 
> 2013-10-25  Jakub Jelinek  
> 
>   * tree-vrp.c (remove_range_assertions): If ASSERT_EXPR_VAR
>   has no other immediate uses but in the condition and ASSERT_EXPR
>   and the other successor of the predecessor bb is
>   __builtin_unreachable (), set_range_info of the ASSERT_EXPR_VAR
>   to the range info of the ASSERT_EXPR's lhs.
> 
> --- gcc/tree-vrp.c.jj 2013-10-24 10:19:21.0 +0200
> +++ gcc/tree-vrp.c2013-10-24 14:32:29.065878208 +0200
> @@ -6488,12 +6488,16 @@ remove_range_assertions (void)
>  {
>basic_block bb;
>gimple_stmt_iterator si;
> +  /* 1 if looking at ASSERT_EXPRs immediately at the beginning of
> + a basic block preceeded by GIMPLE_COND branching to it and
> + __builtin_trap, -1 if not yet checked, 0 otherwise.  */
> +  int is_unreachable;
>  
>/* Note that the BSI iterator bump happens at the bottom of the
>   loop and no bump is necessary if we're removing the statement
>   referenced by the current BSI.  */
>FOR_EACH_BB (bb)
> -for (si = gsi_start_bb (bb); !gsi_end_p (si);)
> +for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);)
>{
>   gimple stmt = gsi_stmt (si);
>   gimple use_stmt;
> @@ -6501,30 +6505,96 @@ remove_range_assertions (void)
>   if (is_gimple_assign (stmt)
>   && gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
> {
> + tree lhs = gimple_assign_lhs (stmt);
>   tree rhs = gimple_assign_rhs1 (stmt);
>   tree var;
>   tree cond = fold (ASSERT_EXPR_COND (rhs));
> - use_operand_p use_p;
> + use_operand_p use_p, use2_p;
>   imm_use_iterator iter;
>  
>   gcc_assert (cond != boolean_false_node);
>  
> - /* Propagate the RHS into every use of the LHS.  */
>   var = ASSERT_EXPR_VAR (rhs);
> - FOR_EACH_IMM_USE_STMT (use_stmt, iter,
> -gimple_assign_lhs (stmt))
> + gcc_assert (TREE_CODE (var) == SSA_NAME);
> +
> + if (!POINTER_TYPE_P (TREE_TYPE (lhs))
> + && SSA_NAME_RANGE_INFO (lhs))
> +   {
> + if (is_unreachable == -1)
> +   {
> + is_unreachable = 0;
> + if (single_pred_p (bb))
> +   {
> + basic_block pred_bb = single_pred (bb);
> + gimple last = last_stmt (pred_bb);
> + if (last && gimple_code (last) == GIMPLE_COND)
> +   {
> + basic_block other_bb
> +   = EDGE_SUCC (pred_bb, 0)->dest;
> + if (other_bb == bb)
> +   other_bb = EDGE_SUCC (pred_bb, 1)->dest;
> + if (EDGE_COUNT (other_bb->succs) == 0)
> +   {
> + gimple_stmt_iterator gsi
> +   = gsi_after_labels (other_bb);
> + if (!gsi_end_p (gsi)
> + && gimple_call_builtin_p
> + (gsi_stmt (gsi),
> +  BUILT_IN_UNREACHABLE))
> +   is_unreachable = 1;
> +   }
> +   }

Please factor this out into a separate function.  Like
obfuscated_fallthru_edge_p () (better name?).  Best somewhere
in tree-cfg.c.  Also you want to skip debug stmts ... (not sure
if we reliably enough remove them, but I can envision a compare-debug
fail with some DCE disabled at least).

> +   }
> +   }
> + /* Handle
> +if (x_7 >= 10 && x_7 < 20)
> +  __builtin_unreachable ();
> +x_8 = ASSERT_EXPR ;
> +  

Re: [PATCH] Introduce [sg]et_nonzero_bits

2013-10-29 Thread Richard Biener
On Fri, 25 Oct 2013, Jakub Jelinek wrote:

> Hi!
> 
> tree-ssa-ccp.c already computes which bits are known to be zero, but
> we preserve that info only for pointers and not for integers.
> This patch changes SSA_NAME_RANGE_INFO, so we preserve that info even for
> integers.  The bitmask is also computed from range info.
> There are no users of this besides ccp itself right now, but I'm working
> on using that info e.g. in the vectorizer.
> 
> I had to tweak one testcase, because it relied on one of the two warnings
> being emitted during VRP1 and one during VRP2, but because of the range info
> derived nonzero_bits we now manage to optimize away the second test during
> CCP3, which doesn't have -Wtype-limits warnings.  If it would be too
> important to keep warning even there, I guess it could check get_range_info
> when folding a condition to 0/1 and warn if the rhs isn't in the lhs' range.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> During the development of the patch I had there code to not only
> use get_range_info if otherwise returning VARYING, but also
> if it was already constant, but that failed bootstrap miserably:
> 
> It was something along the lines of:
> 
> ++/* If get_nonzero_bits tells us some bits
> ++   have to be zero and they aren't, something
> ++   went wrong.  Only bits in value where val.mask
> ++   has zeroes (i.e. valid bit from CCP), mask has ones
> ++   (don't care about bits outside of precision) and
> ++   has zeroes in nonzero_bits (guaranteed to be zero)
> ++   should be verified to be zero.  */
> ++double_int valv = tree_to_double_int (val.value);
> ++gcc_assert ((valv & ~val.mask
> ++ & ~nonzero_bits & mask).is_zero ());
> ++if (!(valv & ~nonzero_bits & mask).is_zero ())
> ++  val.value = double_int_to_tree (TREE_TYPE (lhs),
> ++  valv & nonzero_bits);
> ++if (nonzero_bits.is_zero ())
> ++  val.mask = double_int_zero;
> ++else
> ++  val.mask = val.mask & (nonzero_bits | ~mask);
> 
> The problem was that during the iteration when only a subset of the edges
> were executable it sometimes failed the assertion, dunno if that is
> the result of the nonzero bits info in this case coming from completely
> different algorithm (from VRP ranges) and that simply until the CCP
> iterations settle it can be wrong, so just to be safe in that case I'm
> adding the info only in ccp_finalize (and the new nonzero_bits mask with
> the old one).  From what I could see, the range info on that testcase looked
> just fine, the nonzero_bits derived from it too, if desirable, I can try to
> cook up a small testcase showing that (reduce from libgcov.c that exhibited
> that).

Surely you can't rely on CCP and VRP compute exactly the same
nonzero_bits.  As you don't record/compute zero_bits you can't
tell whether a not set bit in nonzer_bits is "don't know" or
if it is "zero".  And you cannot do an assert during iteration
(during iteration optimistically 'wrong' values can disappear).

During CCP iteration the rule is that bits may only be added to mask
(and obviously the constant itself on a CONSTANT lattice value may
not change).

More comments inline

> 2013-10-24  Jakub Jelinek  
> 
>   * gimple-pretty-print.c (dump_ssaname_info): Print newline also
>   in case of VR_VARYING.  Print get_nonzero_bits if not all ones.
>   * tree-ssanames.h (struct range_info_def): Add nonzero_bits field.
>   (set_nonzero_bits, get_nonzero_bits): New prototypes.
>   * tree-ssa-ccp.c (get_default_value): Use get_range_info to see if
>   a default def isn't partially constant.
>   (ccp_finalize): If after IPA, set_range_info if integral SSA_NAME
>   is known to be partially zero.
>   (evaluate_stmt): If we'd return otherwise VARYING, use get_range_info
>   to see if a default def isn't partially constant.
>   * tree-ssanames.c (set_range_info): Initialize nonzero_bits upon
>   creation of a range, if VR_RANGE, try to improve nonzero_bits from
>   the range.
>   (set_nonzero_bits, get_nonzero_bits): New functions.
> 
>   * g++.dg/warn/pr33738.C (main): Initialize a2 again to make sure
>   we warn about it already during VRP1 pass.
> 
> --- gcc/gimple-pretty-print.c.jj  2013-10-23 14:43:12.0 +0200
> +++ gcc/gimple-pretty-print.c 2013-10-24 17:26:59.650945232 +0200
> @@ -1731,7 +1731,7 @@ dump_ssaname_info (pretty_printer *buffe
>if (!POINTER_TYPE_P (TREE_TYPE (node))
>&& SSA_NAME_RANGE_INFO (node))
>  {
> -  double_int min, max;
> +  double_int min, max, nonzero_bits;
>value_range_type range_type = get_range_info (node, &min, &max);
>  
>if (range_type == VR_VARYING)
> @@ -1744,8 +1744,20 @@ dump_ssaname_info (pretty_printer *bu

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Yury Gribov

> Sorry for the delay, I finally found time to look at it.

Np, thanks for helping!

> While your patch fixes the issue, I wonder
> ...
> potentially increase register pressure.

Makes sense. I didn't take care of this because I believed that we can 
freely allocate vregs and rely on register allocator do it's work.


> So here is (completely untested) alternate fix:

Nice, this seems to fix my repro cases.

I'll only be able to test my larger ARM app in 1-2 days. We could wait 
until then or commit now.


-Y



[AArch64] Fix size of memory store for the vst_lane intrinsics

2013-10-29 Thread James Greenhalgh

Hi,

The vst_lane_ intrinsics should write
(sizeof (lane_type) * n) bytes to memory.

In their current form, their asm constraints suggest a write size of
(sizeof (vector_type) * n). This is anywhere from 1 to 16 times too
much data, can cause huge headaches with dead store elimination.

This patch better models how much data we will be writing, which in
turn lets us eliminate the memory clobber. Together, we avoid the
problems with dead store elimination.

Tested with aarch64.exp and checked the C++ neon mangling test which
often breaks when you do these ugly casts.

OK?

Thanks,
James

---
gcc/

2013-10-29  James Greenhalgh  

* config/aarch64/arm_neon.h
(__ST2_LANE_FUNC): Better model data size.
(__ST3_LANE_FUNC): Likewise.
(__ST4_LANE_FUNC): Likewise.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 787ff15..7a63ea1 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -14704,16 +14704,19 @@ __LD4_LANE_FUNC (uint64x2x4_t, uint64_t, 2d, d, u64, q)
 
 #define __ST2_LANE_FUNC(intype, ptrtype, regsuffix,			\
 			lnsuffix, funcsuffix, Q)			\
+  typedef struct { ptrtype __x[2]; } __ST2_LANE_STRUCTURE_##intype;	\
   __extension__ static __inline void	\
   __attribute__ ((__always_inline__))	\
-  vst2 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr,		\
+  vst2 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr,			\
  intype b, const int c)		\
   {	\
+__ST2_LANE_STRUCTURE_##intype *__p =\
+(__ST2_LANE_STRUCTURE_##intype *)ptr;	\
 __asm__ ("ld1 {v16." #regsuffix ", v17." #regsuffix "}, %1\n\t"	\
 	 "st2 {v16." #lnsuffix ", v17." #lnsuffix "}[%2], %0\n\t"	\
-	 : "=Q"(*(intype *) ptr)	\
+	 : "=Q"(*__p)		\
 	 : "Q"(b), "i"(c)		\
-	 : "memory", "v16", "v17");	\
+	 : "v16", "v17");		\
   }
 
 __ST2_LANE_FUNC (int8x8x2_t, int8_t, 8b, b, s8,)
@@ -14743,16 +14746,19 @@ __ST2_LANE_FUNC (uint64x2x2_t, uint64_t, 2d, d, u64, q)
 
 #define __ST3_LANE_FUNC(intype, ptrtype, regsuffix,			\
 			lnsuffix, funcsuffix, Q)			\
+  typedef struct { ptrtype __x[3]; } __ST3_LANE_STRUCTURE_##intype;	\
   __extension__ static __inline void	\
   __attribute__ ((__always_inline__))	\
-  vst3 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr,		\
+  vst3 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr,			\
  intype b, const int c)		\
   {	\
+__ST3_LANE_STRUCTURE_##intype *__p =\
+(__ST3_LANE_STRUCTURE_##intype *)ptr;	\
 __asm__ ("ld1 {v16." #regsuffix " - v18." #regsuffix "}, %1\n\t"	\
 	 "st3 {v16." #lnsuffix " - v18." #lnsuffix "}[%2], %0\n\t"	\
-	 : "=Q"(*(intype *) ptr)	\
+	 : "=Q"(*__p)		\
 	 : "Q"(b), "i"(c)		\
-	 : "memory", "v16", "v17", "v18");\
+	 : "v16", "v17", "v18");	\
   }
 
 __ST3_LANE_FUNC (int8x8x3_t, int8_t, 8b, b, s8,)
@@ -14782,16 +14788,19 @@ __ST3_LANE_FUNC (uint64x2x3_t, uint64_t, 2d, d, u64, q)
 
 #define __ST4_LANE_FUNC(intype, ptrtype, regsuffix,			\
 			lnsuffix, funcsuffix, Q)			\
+  typedef struct { ptrtype __x[4]; } __ST4_LANE_STRUCTURE_##intype;	\
   __extension__ static __inline void	\
   __attribute__ ((__always_inline__))	\
-  vst4 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr,		\
+  vst4 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr,			\
  intype b, const int c)		\
   {	\
+__ST4_LANE_STRUCTURE_##intype *__p =\
+(__ST4_LANE_STRUCTURE_##intype *)ptr;	\
 __asm__ ("ld1 {v16." #regsuffix " - v19." #regsuffix "}, %1\n\t"	\
 	 "st4 {v16." #lnsuffix " - v19." #lnsuffix "}[%2], %0\n\t"	\
-	 : "=Q"(*(intype *) ptr)	\
+	 : "=Q"(*__p)		\
 	 : "Q"(b), "i"(c)		\
-	 : "memory", "v16", "v17", "v18", "v19");			\
+	 : "v16", "v17", "v18", "v19");\
   }
 
 __ST4_LANE_FUNC (int8x8x4_t, int8_t, 8b, b, s8,)

Re: [PATCH] Use get_nonzero_bits to improve vectorization

2013-10-29 Thread Richard Biener
On Fri, 25 Oct 2013, Jakub Jelinek wrote:

> Hi!
> 
> The following patch makes use of the computed nonzero_bits preserved
> in the SSA_NAME_RANGE_INFO.
> I chose to write a new routine instead of improving current
> highest_pow2_factor, because that routine didn't care about overflows etc.
> and by working on ctz numbers instead of powers of two in UHWI we can handle
> even larger constants etc.  highest_pow2_factor could very well overflow to
> zero etc.  So, the patch introduces a new tree_ctz routine and reimplements
> highest_pow2_factor on top of that, plus uses tree_ctz also in
> get_object_alignment_2 and in the vectorizer to determine if it can avoid
> scalar loop for bound (and indirectly also in the analysis whether peeling
> for alignment is needed).
> 
> With this patch, e.g.
> int a[1024];
> 
> void
> foo (int x, int y)
> {
>   int i;
>   x &= -32;
>   y &= -32;
>   for (i = x + 32; i < y; i++)
> a[i]++;
> }
> can be vectorized without any peeling for alignment or scalar loop
> afterwards.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2013-10-25  Jakub Jelinek  
> 
>   * tree.c (tree_ctz): New function.
>   * tree.h (tree_ctz): New prototype.
>   * tree-ssanames.h (get_range_info, get_nonzero_bits): Change
>   first argument from tree to const_tree.
>   * tree-ssanames.c (get_range_info, get_nonzero_bits): Likewise.
>   * tree-vectorizer.h (vect_generate_tmps_on_preheader): New prototype.
>   * tree-vect-loop-manip.c (vect_generate_tmps_on_preheader): No longer
>   static.
>   * expr.c (highest_pow2_factor): Reimplemented using tree_ctz.
>   * tree-vect-loop.c (vect_analyze_loop_operations,
>   vect_transform_loop): Don't force scalar loop for bound just because
>   number of iterations is unknown, only do it if it is not known to be
>   a multiple of vectorization_factor.
>   * builtins.c (get_object_alignment_2): Use tree_ctz on offset.
> 
> --- gcc/tree.c.jj 2013-10-23 14:43:12.0 +0200
> +++ gcc/tree.c2013-10-25 15:00:55.296178794 +0200
> @@ -2213,6 +2213,110 @@ tree_floor_log2 (const_tree expr)
> : floor_log2 (low));
>  }
>  
> +/* Return number of known trailing zero bits in EXPR, or, if the value of
> +   EXPR is known to be zero, the precision of it's type.  */
> +
> +int

unsigned int?

> +tree_ctz (const_tree expr)
> +{
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
> +  && !POINTER_TYPE_P (TREE_TYPE (expr)))
> +return 0;
> +
> +  int ret1, ret2, prec = TYPE_PRECISION (TREE_TYPE (expr));
> +  switch (TREE_CODE (expr))
> +{
> +case INTEGER_CST:
> +  ret1 = tree_to_double_int (expr).trailing_zeros ();
> +  return MIN (ret1, prec);
> +case SSA_NAME:
> +  ret1 = get_nonzero_bits (expr).trailing_zeros ();
> +  return MIN (ret1, prec);
> +case PLUS_EXPR:
> +case MINUS_EXPR:
> +case BIT_IOR_EXPR:
> +case BIT_XOR_EXPR:
> +case MIN_EXPR:
> +case MAX_EXPR:
> +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));

This first recurses but if either one returns 0 you don't have
to (that cuts down the recursion from exponential to linear in
the common case?).  Thus, early out on ret == 0?

> +  return MIN (ret1, ret2);
> +case POINTER_PLUS_EXPR:
> +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> +  ret2 = MIN (ret2, prec);

Why do you need that here but not elsewhere when processing
binary ops?

> +  return MIN (ret1, ret2);
> +case BIT_AND_EXPR:
> +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> +  return MAX (ret1, ret2);
> +case MULT_EXPR:
> +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> +  return MIN (ret1 + ret2, prec);
> +case LSHIFT_EXPR:
> +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> +  if (host_integerp (TREE_OPERAND (expr, 1), 1)

check that first before recursing for op0 - if op1 is negative
you simply return ret1 which looks wrong, too.

> +   && ((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1)
> +   < (unsigned HOST_WIDE_INT) prec))

This check is to avoid overflowing ret1 + ret2?  If so, why not add

> + {
> +   ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1);

  ret2 = MIN (ret2, prec);

instead?

> +   return MIN (ret1 + ret2, prec);
> + }
> +  return ret1;
> +case RSHIFT_EXPR:
> +  if (host_integerp (TREE_OPERAND (expr, 1), 1)
> +   && ((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1)
> +   < (unsigned HOST_WIDE_INT) prec))
> + {
> +   ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> +   ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1);
> +   if (ret1 > ret2)
> + return ret1 - ret2;
> + }
> +  return 0;

Seems to be slightly better structured.  Lo

Re: [PATCH] fixing typo in expr.c to allow proper recognition of complex addresses in some arches.

2013-10-29 Thread Eric Botcazou
> My reading of memory_address_addr_space is that MODE is the mode of the
> memory reference, not the mode of the address.  I fail to see how
> passing in the mode of the address in the first call can be correct.
> 
> What am I missing?

Nothing, it's me confusing

rtx memory_address_addr_space (enum machine_mode, rtx, addr_space_t);

with

rtx convert_memory_address_addr_space (enum machine_mode, rtx, addr_space_t);

for a long time apparently...  Sorry about that.  In the meantime, I installed 
the obvious patch to eliminate the small redundancy.  Tested on x86_64/Linux.


2013-10-29  Eric Botcazou  

* expr.c (expand_expr_real_1) : Eliminate small redundancy.


-- 
Eric BotcazouIndex: expr.c
===
--- expr.c	(revision 204106)
+++ expr.c	(working copy)
@@ -9648,8 +9648,8 @@ expand_expr_real_1 (tree exp, rtx target
 	rtx off
 	  = immed_double_int_const (mem_ref_offset (exp), address_mode);
 	op0 = simplify_gen_binary (PLUS, address_mode, op0, off);
+	op0 = memory_address_addr_space (mode, op0, as);
 	  }
-	op0 = memory_address_addr_space (mode, op0, as);
 	temp = gen_rtx_MEM (mode, op0);
 	set_mem_attributes (temp, exp, 0);
 	set_mem_addr_space (temp, as);

Re: libsanitizer merge from upstream r191666

2013-10-29 Thread Jakub Jelinek
> On Wed, Oct 2, 2013 at 12:51 PM, Konstantin Serebryany
> > 2013-10-XX  Kostya Serebryany  
> >
> > * g++.dg/asan/asan_test.cc: Update the test
> > to match the fresh asan run-time.
> > * c-c++-common/asan/stack-overflow-1.c: Ditto.
> >
> > === gcc/ChangeLog
> >
> > 2013-10-XX  Kostya Serebryany  
> >
> > * asan.c: Update to match the changed asan API.
> > (asan_emit_stack_protection): update the string stored in the
> > stack red zone to match new API. Store the PC of the current
> > function in the red zone.
> > (asan_global_struct): update the __asan_global definition to match
> > the new API.
> > (asan_add_global): Ditto.
> > * sanitizer.def: rename __asan_init_v1 to __asan_init_v3

The "Update to match the changed asan API." should either be
dropped, or come on a line before the * asan.c (asan_emit_stack_protection):
line.  All descriptions should start with capital letters, end with ., two
spaces after . if followed by another sentence.

Besides that, here is (completely untested) attempt to give you
the pc of the first instruction of the function and two minor changes
(pp_string (something, "") is useless and in two spots I've noticed
you didn't add space before ( in function call).

Finally, if the new libasan is ABI incompatible with the old one, which
seems it is, then libsanitizer/asan/libtool-version (and perhaps also
libsanitizer/tsan/libtool-version, haven't looked if that one is ABI
compatible or not) needs to be bumped (to 1:0:0 ?).

--- gcc/asan.c.jj   2013-10-29 11:58:30.0 +0100
+++ gcc/asan.c  2013-10-29 13:04:07.709667677 +0100
@@ -921,6 +921,15 @@ asan_clear_shadow (rtx shadow_mem, HOST_
   add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE * 80 / 100);
 }
 
+void
+asan_function_start (void)
+{
+  section *fnsec = function_section (current_function_decl);
+  switch_to_section (fnsec);
+  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC",
+ current_function_funcdef_no);
+}
+
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
directly, epilogue sequence returned.  BASE is the register holding the
stack base, against which OFFSETS array offsets are relative to, OFFSETS
@@ -936,12 +945,13 @@ asan_emit_stack_protection (rtx base, HO
int length)
 {
   rtx shadow_base, shadow_mem, ret, mem;
+  char buf[30];
   unsigned char shadow_bytes[4];
   HOST_WIDE_INT base_offset = offsets[length - 1], offset, prev_offset;
   HOST_WIDE_INT last_offset, last_size;
   int l;
   unsigned char cur_shadow_byte = ASAN_STACK_MAGIC_LEFT;
-  tree str_cst;
+  tree str_cst, decl;
 
   if (shadow_ptr_types[0] == NULL_TREE)
 asan_init_shadow_ptr_types ();
@@ -949,7 +959,6 @@ asan_emit_stack_protection (rtx base, HO
   /* First of all, prepare the description string.  */
   pretty_printer asan_pp;
 
-  pp_string (&asan_pp, "");
   pp_decimal_int (&asan_pp, length / 2 - 1);
   pp_space (&asan_pp);
   for (l = length - 2; l; l -= 2)
@@ -980,7 +989,17 @@ asan_emit_stack_protection (rtx base, HO
   mem = adjust_address (mem, VOIDmode, GET_MODE_SIZE (ptr_mode));
   emit_move_insn (mem, expand_normal (str_cst));
   mem = adjust_address (mem, VOIDmode, GET_MODE_SIZE (ptr_mode));
-  emit_move_insn (mem, expand_normal (str_cst));  // FIXME: should be cur_pc.
+  ASM_GENERATE_INTERNAL_LABEL (buf, "LASANPC", current_function_funcdef_no);
+  decl = build_decl (DECL_SOURCE_LOCATION (current_function_decl),
+VAR_DECL, get_identifier (buf), char_type_node);
+  TREE_ADDRESSABLE (decl) = 1;
+  TREE_READONLY (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 1;
+  TREE_STATIC (decl) = 1;
+  TREE_PUBLIC (decl) = 0;
+  TREE_USED (decl) = 1;
+  emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
  GEN_INT (ASAN_SHADOW_SHIFT),
  NULL_RTX, 1, OPTAB_DIRECT);
@@ -1979,8 +1998,8 @@ asan_add_global (tree decl, tree type, v
 pp_string (&asan_pp, "");
   str_cst = asan_pp_string (&asan_pp);
 
-  pp_string(&module_name_pp, main_input_filename);
-  module_name_cst = asan_pp_string(&module_name_pp);
+  pp_string (&module_name_pp, main_input_filename);
+  module_name_cst = asan_pp_string (&module_name_pp);
 
   if (asan_needs_local_alias (decl))
 {
--- gcc/asan.h.jj   2013-01-11 09:02:50.0 +0100
+++ gcc/asan.h  2013-10-29 12:37:54.190798947 +0100
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.
 #ifndef TREE_ASAN
 #define TREE_ASAN
 
+extern void asan_function_start (void);
 extern void asan_finish_file (void);
 extern rtx asan_emit_stack_protection (rtx, HOST_WIDE_INT *, tree *, int);
 extern bool asan_protect_global (tree);
--- gcc/final.c.jj  2013-10-23 14:43:12.0 +0200
+++ gcc/final.c 2013-10-29 12:49:33.609176613 +0100
@@ -78,6 +78,7 @@ along w

[PATCH][ARM] New rtx cost table for Cortex-A7

2013-10-29 Thread Kyrill Tkachov

Hi all,

This patch adds the new rtx costs for the Cortex-A7 core as well as a new tuning 
structure to contain it.


Tested arm-none-eabi on qemu and no benchmark regressions.

Ok for trunk?

Thanks,
Kyrill

[gcc/]
2013-10-29  Kyrylo Tkachov  

* config/arm/arm.c (cortexa7_extra_costs): New table.
(arm_cortex_a7_tune): New.
* config/arm/arm-cores.def: Use cortex_a7 tuning for cortex-a7.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 17c9bf3..79e2e87 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -125,7 +125,7 @@ ARM_CORE("arm1156t2-s",	  arm1156t2s,	6T2, FL_LDSCHED, v6t2)
 ARM_CORE("arm1156t2f-s",  arm1156t2fs,  6T2, FL_LDSCHED | FL_VFPV2, v6t2)
 ARM_CORE("generic-armv7-a", genericv7a,	7A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-a5",	  cortexa5,	7A, FL_LDSCHED, cortex_a5)
-ARM_CORE("cortex-a7",	  cortexa7,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
+ARM_CORE("cortex-a7",	  cortexa7,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a7)
 ARM_CORE("cortex-a8",	  cortexa8,	7A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",	  cortexa9,	7A, FL_LDSCHED, cortex_a9)
 ARM_CORE("cortex-a15",	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4cdac60..768bc21 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1050,6 +1050,107 @@ const struct cpu_cost_table generic_extra_costs =
   }
 };
 
+const struct cpu_cost_table cortexa7_extra_costs =
+{
+  /* ALU */
+  {
+0,			/* Arith.  */
+0,			/* Logical.  */
+COSTS_N_INSNS (1),	/* Shift.  */
+COSTS_N_INSNS (1),	/* Shift_reg.  */
+COSTS_N_INSNS (1),	/* Arith_shift.  */
+COSTS_N_INSNS (1),	/* Arith_shift_reg.  */
+COSTS_N_INSNS (1),	/* Log_shift.  */
+COSTS_N_INSNS (1),	/* Log_shift_reg.  */
+COSTS_N_INSNS (1),	/* Extend.  */
+COSTS_N_INSNS (1),	/* Extend_arith.  */
+COSTS_N_INSNS (1),	/* Bfi.  */
+COSTS_N_INSNS (1),	/* Bfx.  */
+COSTS_N_INSNS (1),	/* Clz.  */
+0,			/* non_exec.  */
+true		/* non_exec_costs_exec.  */
+  },
+
+  {
+/* MULT SImode */
+{
+  0,			/* Simple.  */
+  COSTS_N_INSNS (1),	/* Flag_setting.  */
+  COSTS_N_INSNS (1),	/* Extend.  */
+  COSTS_N_INSNS (1),	/* Add.  */
+  COSTS_N_INSNS (1),	/* Extend_add.  */
+  COSTS_N_INSNS (7)		/* Idiv.  */
+},
+/* MULT DImode */
+{
+  0,			/* Simple (N/A).  */
+  0,			/* Flag_setting (N/A).  */
+  COSTS_N_INSNS (1),	/* Extend.  */
+  0,			/* Add.  */
+  COSTS_N_INSNS (2),	/* Extend_add.  */
+  0/* Idiv (N/A).  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (1),	/* Load.  */
+COSTS_N_INSNS (1),	/* Load_sign_extend.  */
+COSTS_N_INSNS (3),	/* Ldrd.  */
+COSTS_N_INSNS (1),	/* Ldm_1st.  */
+1,			/* Ldm_regs_per_insn_1st.  */
+2,			/* Ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2),	/* Loadf.  */
+COSTS_N_INSNS (2),	/* Loadd.  */
+COSTS_N_INSNS (1),	/* Load_unaligned.  */
+COSTS_N_INSNS (1),	/* Store.  */
+COSTS_N_INSNS (3),	/* Strd.  */
+COSTS_N_INSNS (1),	/* Stm_1st.  */
+1,			/* Stm_regs_per_insn_1st.  */
+2,			/* Stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2),	/* Storef.  */
+COSTS_N_INSNS (2),	/* Stored.  */
+COSTS_N_INSNS (1)	/* Store_unaligned.  */
+  },
+  {
+/* FP SFmode */
+{
+  COSTS_N_INSNS (15),	/* Div.  */
+  COSTS_N_INSNS (3),	/* Mult.  */
+  COSTS_N_INSNS (7),	/* Mult_addsub. */
+  COSTS_N_INSNS (7),	/* Fma.  */
+  COSTS_N_INSNS (3),	/* Addsub.  */
+  COSTS_N_INSNS (3),	/* Fpconst.  */
+  COSTS_N_INSNS (3),	/* Neg.  */
+  COSTS_N_INSNS (3),	/* Compare.  */
+  COSTS_N_INSNS (3),	/* Widen.  */
+  COSTS_N_INSNS (3),	/* Narrow.  */
+  COSTS_N_INSNS (3),	/* Toint.  */
+  COSTS_N_INSNS (3),	/* Fromint.  */
+  COSTS_N_INSNS (3)		/* Roundint.  */
+},
+/* FP DFmode */
+{
+  COSTS_N_INSNS (30),	/* Div.  */
+  COSTS_N_INSNS (6),	/* Mult.  */
+  COSTS_N_INSNS (10),	/* Mult_addsub.  */
+  COSTS_N_INSNS (7),	/* Fma.  */
+  COSTS_N_INSNS (3),	/* Addsub.  */
+  COSTS_N_INSNS (3),	/* Fpconst.  */
+  COSTS_N_INSNS (3),	/* Neg.  */
+  COSTS_N_INSNS (3),	/* Compare.  */
+  COSTS_N_INSNS (3),	/* Widen.  */
+  COSTS_N_INSNS (3),	/* Narrow.  */
+  COSTS_N_INSNS (3),	/* Toint.  */
+  COSTS_N_INSNS (3),	/* Fromint.  */
+  COSTS_N_INSNS (3)		/* Roundint.  */
+}
+  },
+  /* Vector */
+  {
+COSTS_N_INSNS (1)	/* Alu.  */
+  }
+};
+
 const struct cpu_cost_table cortexa15_extra_costs =
 {
   /* ALU */
@@ -1266,6 +1367,22 @@ const struct tune_params arm_cortex_tune =
   false /* Prefer Neon for 64-bits bitops.  */
 };
 
+const struct tune_params arm_cortex_a7_tune =
+{
+  arm_9e_rtx_costs,
+  &cortexa7_extra_costs,
+  NULL,
+  1,		/* Constant limit. 

Re: [PATCH][ARM] New rtx cost table for Cortex-A7

2013-10-29 Thread Ramana Radhakrishnan

On 10/29/13 12:15, Kyrill Tkachov wrote:

Hi all,

This patch adds the new rtx costs for the Cortex-A7 core as well as a new tuning
structure to contain it.

Tested arm-none-eabi on qemu and no benchmark regressions.

Ok for trunk?


Ok.

Ramana




Re: [wide-int] Treat order comparisons like other binary ops

2013-10-29 Thread Richard Biener
On Sun, 27 Oct 2013, Richard Sandiford wrote:

> Until now, eq_p and ne_p have enforced the same argument rules as things
> like addition, while order comparisons like lts_p have treated the two
> arguments as independent and signed.  Richard, I think you said on IRC
> that you thought lts_p should behave like the others.  E.g. lts_p on two
> trees or two rtxes should make sure that the precisions are the same.

Yes.

> This patch does that.  I think all uses of INT_CST_LT and INT_CST_LT_UNSIGNED
> are really comparing to "infinite" precision, and the UNSIGNED distinction is
> only there because double_int isn't wide enough to be effectively infinite.
> Is that right? 

Yes, it's only because of the case of set MSB on the double-int.

> Since that isn't a problem with widest_int, the patch gets
> rid of INT_CST_LT_UNSIGNED and only uses INT_CST_LT.
>
> 
> The c-lex.c change includes a generic change that I justed posted for trunk.
> 
> I rejigged the order in tree.h slightly so that it matches trunk.
> 
> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK for wide-int?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
> 
> 
> Index: gcc/c-family/c-common.c
> ===
> --- gcc/c-family/c-common.c   2013-10-27 14:11:53.006510519 +
> +++ gcc/c-family/c-common.c   2013-10-27 14:19:27.667578745 +
> @@ -4101,20 +4101,10 @@ shorten_compare (tree *op0_ptr, tree *op
> maxval = convert (*restype_ptr, maxval);
>   }
>  
> -  if (unsignedp && unsignedp0)
> - {
> -   min_gt = INT_CST_LT_UNSIGNED (primop1, minval);
> -   max_gt = INT_CST_LT_UNSIGNED (primop1, maxval);
> -   min_lt = INT_CST_LT_UNSIGNED (minval, primop1);
> -   max_lt = INT_CST_LT_UNSIGNED (maxval, primop1);
> - }
> -  else
> - {
> -   min_gt = INT_CST_LT (primop1, minval);
> -   max_gt = INT_CST_LT (primop1, maxval);
> -   min_lt = INT_CST_LT (minval, primop1);
> -   max_lt = INT_CST_LT (maxval, primop1);
> - }
> +  min_gt = INT_CST_LT (primop1, minval);
> +  max_gt = INT_CST_LT (primop1, maxval);
> +  min_lt = INT_CST_LT (minval, primop1);
> +  max_lt = INT_CST_LT (maxval, primop1);
>  
>val = 0;
>/* This used to be a switch, but Genix compiler can't handle that.  */
> Index: gcc/c-family/c-lex.c
> ===
> --- gcc/c-family/c-lex.c  2013-10-27 14:11:53.006510519 +
> +++ gcc/c-family/c-lex.c  2013-10-27 14:19:27.664578718 +
> @@ -48,9 +48,9 @@ static tree interpret_float (const cpp_t
>enum overflow_type *);
>  static tree interpret_fixed (const cpp_token *, unsigned int);
>  static enum integer_type_kind narrowest_unsigned_type
> - (const wide_int &, unsigned int);
> + (const widest_int &, unsigned int);
>  static enum integer_type_kind narrowest_signed_type
> - (const wide_int &, unsigned int);
> + (const widest_int &, unsigned int);
>  static enum cpp_ttype lex_string (const cpp_token *, tree *, bool, bool);
>  static tree lex_charconst (const cpp_token *);
>  static void update_header_times (const char *);
> @@ -526,7 +526,7 @@ c_lex_with_flags (tree *value, location_
> there isn't one.  */
>  
>  static enum integer_type_kind
> -narrowest_unsigned_type (const wide_int &val, unsigned int flags)
> +narrowest_unsigned_type (const widest_int &val, unsigned int flags)
>  {
>int itk;
>  
> @@ -545,7 +545,7 @@ narrowest_unsigned_type (const wide_int
>   continue;
>upper = TYPE_MAX_VALUE (integer_types[itk]);
>  
> -  if (wi::geu_p (upper, val))
> +  if (wi::geu_p (wi::to_widest (upper), val))
>   return (enum integer_type_kind) itk;
>  }
>  
> @@ -554,7 +554,7 @@ narrowest_unsigned_type (const wide_int
>  
>  /* Ditto, but narrowest signed type.  */
>  static enum integer_type_kind
> -narrowest_signed_type (const wide_int &val, unsigned int flags)
> +narrowest_signed_type (const widest_int &val, unsigned int flags)
>  {
>int itk;
>  
> @@ -573,7 +573,7 @@ narrowest_signed_type (const wide_int &v
>   continue;
>upper = TYPE_MAX_VALUE (integer_types[itk]);
>  
> -  if (wi::geu_p (upper, val))
> +  if (wi::geu_p (wi::to_widest (upper), val))
>   return (enum integer_type_kind) itk;
>  }
>  
> @@ -588,20 +588,18 @@ interpret_integer (const cpp_token *toke
>tree value, type;
>enum integer_type_kind itk;
>cpp_num integer;
> -  cpp_options *options = cpp_get_options (parse_in);
> -  HOST_WIDE_INT ival[2];
> -  wide_int wval;
> +  HOST_WIDE_INT ival[3];
>  
>*overflow = OT_NONE;
>  
>integer = cpp_interpret_integer (parse_in, token, flags);
> -  integer = cpp_num_sign_extend (integer, options->precision);
>if (integer.overflow)
>  *overflow = OT_OVERFLOW;
>  
>ival[0] = integer.low;
>ival[1] = integer.high;
> -  wval = wide_int::from_array (ival, 2, HOST_BITS_PER_WIDE_INT * 2);
>

Re: [wide-int] More optimisations

2013-10-29 Thread Richard Biener
On Sun, 27 Oct 2013, Richard Sandiford wrote:

> This patch adds some more optimisations to the wi:: comparison functions.
> It uses the:
> 
>   #define CONSTANT(X) (__builtin_constant_p (X) && (X))
> 
> idiom that was mentioned before, except that I thought CONSTANT would be
> too easily confused with CONSTANT_P, so I went for CAN_TELL instead.
> Better names welcome.

STATIC_CONSTANT_P similar to static_assert?

> The changes are:
> 
> - Add a fast path to eq_p for when one of the inputs isn't sign-extended.
>   This includes code to handle compile-time 0 specially.
> 
> - Add the opposite optimisation to Mike's lts_p change, if we can tell at
>   compile time that it applies.

I think the cases Mike added should only be enabled when we can figure
them out at compile-time, too.

Ok with that change.

Thanks,
Richard.

> - Add fast paths to ltu_p for constants.
> 
> E.g.:
> 
>   bool
>   f1 (const_tree x)
>   {
> return wi::eq_p (x, 0);
>   }
> 
> now gives:
> 
> xorl%eax, %eax
> cmpw$1, 4(%rdi)
> je  .L5
> rep ret
> .p2align 4,,10
> .p2align 3
> .L5:
> cmpq$0, 16(%rdi)
> sete%al
> ret
> 
>   bool
>   f2 (const_tree x, HOST_WIDE_INT y)
>   {
> return wi::eq_p (x, y);
>   }
> 
> gives:
> 
> movq8(%rdi), %rax
> movzwl  52(%rax), %edx
> xorl%eax, %eax
> andw$1023, %dx
> cmpw$1, 4(%rdi)
> je  .L10
> rep ret
> .p2align 4,,10
> .p2align 3
> .L10:
> xorq16(%rdi), %rsi
> movzwl  %dx, %edx
> movl$64, %ecx
> subl%edx, %ecx
> movq%rsi, %rax
> salq%cl, %rax
> testl   %ecx, %ecx
> cmovg   %rax, %rsi
> testq   %rsi, %rsi
> sete%al
> ret
> 
>   bool
>   f3 (HOST_WIDE_INT x, const_tree y)
>   {
> return wi::lts_p (x, y);
>   }
> 
> is similarly ugly because of way that it ignores TYPE_SIGN and so has to
> explicitly sign-extend "small-prec" cases:
> 
> movq8(%rsi), %rax
> movzwl  4(%rsi), %ecx
> movzwl  52(%rax), %edx
> andl$1023, %edx
> cmpl$1, %ecx
> je  .L16
> leal-1(%rcx), %eax
> sall$6, %ecx
> subl%edx, %ecx
> movq16(%rsi,%rax,8), %rax
> movq%rax, %rdx
> salq%cl, %rdx
> testl   %ecx, %ecx
> cmovg   %rdx, %rax
> sarq$63, %rax
> addl$1, %eax
> ret
> .p2align 4,,10
> .p2align 3
> .L16:
> cmpl$63, %edx
> movq16(%rsi), %rax
> ja  .L13
> movb$64, %cl
> subl%edx, %ecx
> salq%cl, %rax
> sarq%cl, %rax
> .L13:
> cmpq%rdi, %rax
> setg%al
> ret
> 
> but:
> 
>   bool
>   f4 (HOST_WIDE_INT x, const_tree y)
>   {
> return wi::lts_p (x, wi::to_widest (y));
>   }
> 
> is a bit more respectable:
> 
> movzwl  6(%rsi), %eax
> cmpl$1, %eax
> je  .L20
> subl$1, %eax
> movq16(%rsi,%rax,8), %rax
> sarq$63, %rax
> addl$1, %eax
> ret
> .p2align 4,,10
> .p2align 3
> .L20:
> cmpq%rdi, 16(%rsi)
> setg%al
> ret
> 
> For similar reasons:
> 
>   bool
>   f5 (const_tree x)
>   {
> return wi::ltu_p (x, 100);
>   }
> 
> gives:
> 
> movq8(%rdi), %rax
> movzwl  52(%rax), %ecx
> xorl%eax, %eax
> andw$1023, %cx
> cmpw$1, 4(%rdi)
> je  .L26
> rep ret
> .p2align 4,,10
> .p2align 3
> .L26:
> cmpw$63, %cx
> ja  .L23
> movl$1, %eax
> salq%cl, %rax
> subq$1, %rax
> andq16(%rdi), %rax
> .L24:
> cmpq$99, %rax
> setbe   %al
> ret
> .p2align 4,,10
> .p2align 3
> .L23:
> movq16(%rdi), %rax
> jmp .L24
> 
> but:
> 
>   bool
>   f6 (const_tree x)
>   {
> return wi::ltu_p (wi::to_widest (x), 100);
>   }
> 
> gives:
> 
> xorl%eax, %eax
> cmpw$1, 6(%rdi)
> je  .L30
> rep ret
> .p2align 4,,10
> .p2align 3
> .L30:
> cmpq$99, 16(%rdi)
> setbe   %al
> ret
> 
> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK for wide-int?
> 
> Thanks,
> Richard
> 
> 
> Index: gcc/system.h
> ===
> --- gcc/system.h  2013-10-27 14:25:19.144723977 +
> +++ gcc/system.h  2013-10-27 14:25:20.716738045 +
> @@ -711,6 +711,12 @@ #define gcc_unreachable() __builtin_unre
>  #define gcc_unreachable() (fancy_abort (__FILE__, __LINE__, __FUNCTION__))
>  #endif
>  
> +#if GCC_VERSION >= 3001
> +#define CAN_TELL(X) (__builtin_constant_p (X) && (X))
> +

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Wed, Oct 16, 2013 at 09:35:21AM +0400, Yury Gribov wrote:
>> >>> I've recently submitted a bug report regarding invalid
>> unpoisoning of stack frame redzones
>> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone
>> take a look at proposed patch (a simple one-liner) and check whether
>> it's ok for commit?
>> >>
>> >> Can you please be more verbose
>> >
>> > Do you think the proposed patch is ok or should I provide some
>> additional info?
>> 
>> Jakub, Dodji,
>> 
>> Any updates on this one? Note that this bug is a huge blocker for
>> using AddressSanitizer on ARM platforms.
>
> Sorry for the delay, I finally found time to look at it.
> While your patch fixes the issue, I wonder if for the case where
> shadow_mem before the asan_clear_shadow call is already of the form
> (mem (reg)) it isn't better to just adjust next asan_clear_shadow
> base from the end of the cleared region rather than from the start of it,
> because the end will be already in the right pseudo, while with your
> approach it needs to be done in a different register and potentially
> increase register pressure.  So here is (completely untested) alternate fix:

Hmm, it just seems wrong to be assigning to registers returned by force_reg
and expand_binop though.  Would this variation be OK?

Thanks,
Richard


Index: gcc/asan.c
===
--- gcc/asan.c  (revision 204124)
+++ gcc/asan.c  (working copy)
@@ -876,9 +876,10 @@
 }
 
 /* Clear shadow memory at SHADOW_MEM, LEN bytes.  Can't call a library call 
here
-   though.  */
+   though.  If the address of the next byte is in a known register, return
+   that register, otherwise return null.  */
 
-static void
+static rtx
 asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len)
 {
   rtx insn, insns, top_label, end, addr, tmp, jump;
@@ -893,12 +894,12 @@
   if (insn == NULL_RTX)
 {
   emit_insn (insns);
-  return;
+  return 0;
 }
 
   gcc_assert ((len & 3) == 0);
   top_label = gen_label_rtx ();
-  addr = force_reg (Pmode, XEXP (shadow_mem, 0));
+  addr = copy_to_mode_reg (Pmode, XEXP (shadow_mem, 0));
   shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0);
   end = force_reg (Pmode, plus_constant (Pmode, addr, len));
   emit_label (top_label);
@@ -912,6 +913,7 @@
   jump = get_last_insn ();
   gcc_assert (JUMP_P (jump));
   add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE * 80 / 100);
+  return addr;
 }
 
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
@@ -1048,7 +1050,15 @@
   (last_offset - prev_offset)
   >> ASAN_SHADOW_SHIFT);
  prev_offset = last_offset;
- asan_clear_shadow (shadow_mem, last_size >> ASAN_SHADOW_SHIFT);
+ rtx end_addr = asan_clear_shadow (shadow_mem,
+   last_size >> ASAN_SHADOW_SHIFT);
+ if (end_addr)
+   {
+ shadow_mem
+   = adjust_automodify_address (shadow_mem, VOIDmode, end_addr,
+last_size >> ASAN_SHADOW_SHIFT);
+ prev_offset += last_size;
+   }
  last_offset = offset;
  last_size = 0;
}



RE: [PING] [AArch64] Peepholes to generate ldp and stp instructions

2013-10-29 Thread Hurugalawadi, Naveen
Hi, 

>> You are better off CCing the maintainers for such reviews. Let me do
>> that for you. I cannot approve or reject this patch but I have a few
>> comments as below.

Thanks for the quick review and comments.
Please find attached the modified patch as per review comments.
Please review the same and let me know if its okay.
 
Build and tested on aarch64-thunder-elf (using Cavium's internal
simulator). No new regressions.

2013-10-29   Naveen H.S  

gcc/
* config/aarch64/aarch64.md (peephole2 to generate ldp
instruction for 2 consecutive loads from memory): New.
(peephole2 to generate stp instruction for 2 consecutive
stores to memory in integer mode): New.
(peephole2 to generate ldp instruction for 2 consecutive
loads from memory in floating point mode): New.
(peephole2 to generate stp instruction for 2 consecutive
stores to memory in floating point mode): New.

gcc/testsuite
* gcc.target/aarch64/ldp-stp.c: New testcase.

Thanks,
Naveen

--- gcc/config/aarch64/aarch64.md	2013-10-28 17:15:52.363975264 +0530
+++ gcc/config/aarch64/aarch64.md	2013-10-29 17:40:48.516129561 +0530
@@ -1068,6 +1068,27 @@
(set_attr "mode" "")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 "register_operand")
+	(match_operand:GPI 1 "aarch64_mem_pair_operand"))
+   (set (match_operand:GPI 2 "register_operand")
+	(match_operand:GPI 3 "memory_operand"))]
+  "GET_CODE (operands[1]) == MEM
+   && GET_CODE (XEXP (operands[1], 0)) == PLUS
+   && REG_P (XEXP (XEXP (operands[1], 0), 0))
+   && CONST_INT_P (XEXP (XEXP (operands[1], 0), 1))
+   && GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+   && REGNO (operands[0]) != REGNO (operands[2])
+   && GP_REGNUM_P (REGNO (operands[0])) && GP_REGNUM_P (REGNO (operands[2]))
+   && REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+   && rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+  GET_MODE_SIZE (mode)))"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "store_pair"
@@ -1085,6 +1106,27 @@
(set_attr "mode" "")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 "aarch64_mem_pair_operand")
+	(match_operand:GPI 1 "register_operand"))
+   (set (match_operand:GPI 2 "memory_operand")
+	(match_operand:GPI 3 "register_operand"))]
+  "GET_CODE (operands[0]) == MEM
+   && GET_CODE (XEXP (operands[0], 0)) == PLUS
+   && REG_P (XEXP (XEXP (operands[0], 0), 0))
+   && CONST_INT_P (XEXP (XEXP (operands[0], 0), 1))
+   && GET_MODE (operands[1]) == GET_MODE (XEXP (XEXP (operands[0], 0), 0))
+   && REGNO (operands[1]) != REGNO (operands[3])
+   && GP_REGNUM_P (REGNO (operands[1])) && GP_REGNUM_P (REGNO (operands[3]))
+   && REGNO_REG_CLASS (REGNO (operands[1]))
+  == REGNO_REG_CLASS (REGNO (operands[3]))
+   && rtx_equal_p (XEXP (operands[2], 0),
+		   plus_constant (Pmode, XEXP (operands[0], 0),
+  GET_MODE_SIZE (mode)))"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 1 and 3 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "load_pair"
@@ -1102,6 +1144,27 @@
(set_attr "mode" "")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 "register_operand")
+	(match_operand:GPF 1 "aarch64_mem_pair_operand"))
+   (set (match_operand:GPF 2 "register_operand")
+	(match_operand:GPF 3 "memory_operand"))]
+  "GET_CODE (operands[1]) == MEM
+   && GET_CODE (XEXP (operands[1], 0)) == PLUS
+   && REG_P (XEXP (XEXP (operands[1], 0), 0))
+   && CONST_INT_P (XEXP (XEXP (operands[1], 0), 1))
+   && GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+   && REGNO (operands[0]) != REGNO (operands[2])
+   && FP_REGNUM_P (REGNO (operands[0])) && FP_REGNUM_P (REGNO (operands[2]))
+   && REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+   && rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+  GET_MODE_SIZE (mode)))"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn "store_pair"
@@ -1119,6 +1182,27 @@
(set_attr "mode" "")]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 "aarch64_mem_pair_operand")
+	(match_operand:GPF 1 "register_operand"))
+   (set (match_operand:GPF 2 "memory_operand")
+	(match_operand:GPF 3 "register_operand"))]
+  "GET_CODE (operands[0]) == MEM
+   && GET_CODE (XEXP (operands[0], 0)) == PLUS
+   && REG_P (XEXP (XEXP (operands[0], 0), 0))
+   && CONST_INT_P (XEXP (XEXP (operands[0]

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Jakub Jelinek
On Tue, Oct 29, 2013 at 12:25:33PM +, Richard Sandiford wrote:
> >> Any updates on this one? Note that this bug is a huge blocker for
> >> using AddressSanitizer on ARM platforms.
> >
> > Sorry for the delay, I finally found time to look at it.
> > While your patch fixes the issue, I wonder if for the case where
> > shadow_mem before the asan_clear_shadow call is already of the form
> > (mem (reg)) it isn't better to just adjust next asan_clear_shadow
> > base from the end of the cleared region rather than from the start of it,
> > because the end will be already in the right pseudo, while with your
> > approach it needs to be done in a different register and potentially
> > increase register pressure.  So here is (completely untested) alternate fix:
> 
> Hmm, it just seems wrong to be assigning to registers returned by force_reg
> and expand_binop though.  Would this variation be OK?

Why?  If it is a pseudo, it is certainly a pseudo that isn't used for
anything else, as it is the result of (base >> 3) + constant, if it isn't a
pseudo, then supposedly it is better not to just keep adding the offsets to
a base and thus not to use the end address from asan_clear_shadow.
Your patch would force using the end address even if shadow_base
is initially say some_reg + 32, I think it is better to use shadow_mem
some_reg + 72 next time instead of some_other_reg + 40.

Jakub


Re: [wide-int] More optimisations

2013-10-29 Thread Richard Sandiford
Richard Biener  writes:
> On Sun, 27 Oct 2013, Richard Sandiford wrote:
>> This patch adds some more optimisations to the wi:: comparison functions.
>> It uses the:
>> 
>>   #define CONSTANT(X) (__builtin_constant_p (X) && (X))
>> 
>> idiom that was mentioned before, except that I thought CONSTANT would be
>> too easily confused with CONSTANT_P, so I went for CAN_TELL instead.
>> Better names welcome.
>
> STATIC_CONSTANT_P similar to static_assert?

Sounds good.

>> The changes are:
>> 
>> - Add a fast path to eq_p for when one of the inputs isn't sign-extended.
>>   This includes code to handle compile-time 0 specially.
>> 
>> - Add the opposite optimisation to Mike's lts_p change, if we can tell at
>>   compile time that it applies.
>
> I think the cases Mike added should only be enabled when we can figure
> them out at compile-time, too.

Well, the idea with most of these functions was to handle the simple
"everything is one HWI" cases inline.  For operations like addition this
needs to be based on the precision, since that's the only cheap way of
knowing whether the result is also a single HWI.  But for logic operations
like AND and OR, it can be based on whether the inputs are single HWIs,
since single-HWI inputs give a single-HWI output.  This catches more cases.

lts_p is a bit like AND and OR in this respect.  fits_shwi_p is the
same as "len == 1", which is simple to check and is often true.  If we
restrict the optimisation to STATIC_CONSTANT_P we'll end up using the
out-of-line comparison for all TREE_INT_CST_LTs, even the int-int ones
that we currently handle inline.

Thanks,
Richard



Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins

2013-10-29 Thread Jeff Law

On 10/29/13 04:17, Richard Biener wrote:

On Mon, Oct 28, 2013 at 10:21 PM, Jeff Law  wrote:

On 10/25/13 11:57, Ilya Enkovich wrote:



There are currently two known issues with LTO. The first one is ICE in
LTO streamer when it reads instrumented code. The second one is
unitialized flag_check_pointers when code is compiled by lto1 (do you
know why it may happen BTW?). It also causes ICE beacause instrumented
code is met when not expected. Of course, I'll fix these problems
anyway, but I was going to allow '-fcheck-pointers -flto' only when
checker testsuite has 100% pass rate with lto.


No idea about why LTO would be failing in this way -- my experience with LTO
is virtually nil.

I'm OK with the restriction as a temporary measure.  But we really dislike
this kidn of stuff, so I need a commitment from you to dive into these
problems and sort out what's really going on.


I'm not ok with adding such arbitrary restrictions.  LTO support should
be trivial.
In  that case, Ilya, you've got to resolve the LTO issues before this 
can move forward.


jeff



Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Tue, Oct 29, 2013 at 12:25:33PM +, Richard Sandiford wrote:
>> >> Any updates on this one? Note that this bug is a huge blocker for
>> >> using AddressSanitizer on ARM platforms.
>> >
>> > Sorry for the delay, I finally found time to look at it.
>> > While your patch fixes the issue, I wonder if for the case where
>> > shadow_mem before the asan_clear_shadow call is already of the form
>> > (mem (reg)) it isn't better to just adjust next asan_clear_shadow
>> > base from the end of the cleared region rather than from the start of it,
>> > because the end will be already in the right pseudo, while with your
>> > approach it needs to be done in a different register and potentially
>> > increase register pressure.  So here is (completely untested) alternate 
>> > fix:
>> 
>> Hmm, it just seems wrong to be assigning to registers returned by force_reg
>> and expand_binop though.  Would this variation be OK?
>
> Why?

Well, that was the traditional rule:

   The caller must not alter the value in the register we return,
   since we mark it as a "constant" register.  */

rtx
force_reg (enum machine_mode mode, rtx x)
{

but maybe it doesn't matter now, since the constant register flag has
gone and since we seem to use REG_EQUAL rather than REG_EQUIV...

> If it is a pseudo, it is certainly a pseudo that isn't used for
> anything else, as it is the result of (base >> 3) + constant, if it isn't a
> pseudo, then supposedly it is better not to just keep adding the offsets to
> a base and thus not to use the end address from asan_clear_shadow.
> Your patch would force using the end address even if shadow_base
> is initially say some_reg + 32, I think it is better to use shadow_mem
> some_reg + 72 next time instead of some_other_reg + 40.

But I thought the register pressure thing was to avoid having the
original base live across the loop.  Doesn't that apply for reg + offset
as well as plain reg?  If we have to force some_reg + 32 into a
temporary and then loop on it, using the new base should avoid keeping
both it and some_reg live at the same time.  Plus smaller offsets are
often better on some targets.

Thanks,
Richard



Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Jakub Jelinek
On Tue, Oct 29, 2013 at 01:06:21PM +, Richard Sandiford wrote:
> > If it is a pseudo, it is certainly a pseudo that isn't used for
> > anything else, as it is the result of (base >> 3) + constant, if it isn't a
> > pseudo, then supposedly it is better not to just keep adding the offsets to
> > a base and thus not to use the end address from asan_clear_shadow.
> > Your patch would force using the end address even if shadow_base
> > is initially say some_reg + 32, I think it is better to use shadow_mem
> > some_reg + 72 next time instead of some_other_reg + 40.
> 
> But I thought the register pressure thing was to avoid having the
> original base live across the loop.  Doesn't that apply for reg + offset
> as well as plain reg?  If we have to force some_reg + 32 into a
> temporary and then loop on it, using the new base should avoid keeping
> both it and some_reg live at the same time.  Plus smaller offsets are
> often better on some targets.

On the other side, if we force the address to be just register based when it
previously wasn't, it makes life harder for alias analysis etc., at least in
the past say on ia64 I remember the fact that it didn't allow anything but
registers as valid addresses often resulted in tons of missed
optimizations), so in the end I guess I have nothing against the original
patch (with whatever form of the copy to reg call is desirable).

Jakub


Re: libsanitizer merge from upstream r191666

2013-10-29 Thread Jakub Jelinek
On Tue, Oct 29, 2013 at 06:49:30AM -0700, Konstantin Serebryany wrote:
> Thanks!
> (At this time I will be slow with response due to travel)

BTW, don't have compiled clang/llvm pre-3.4 around to look at, for the use
after return, do you emit always the the __asan_*malloc/free calls for the
stack vars, or only conditionally based on some compiler flag (and in the
latter case, is that flag on by default or not)?

Jakub


Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins

2013-10-29 Thread Ilya Enkovich
On 29 Oct 06:59, Jeff Law wrote:
> On 10/29/13 04:17, Richard Biener wrote:
> >On Mon, Oct 28, 2013 at 10:21 PM, Jeff Law  wrote:
> >>On 10/25/13 11:57, Ilya Enkovich wrote:
> >>
> >>>
> >>>There are currently two known issues with LTO. The first one is ICE in
> >>>LTO streamer when it reads instrumented code. The second one is
> >>>unitialized flag_check_pointers when code is compiled by lto1 (do you
> >>>know why it may happen BTW?). It also causes ICE beacause instrumented
> >>>code is met when not expected. Of course, I'll fix these problems
> >>>anyway, but I was going to allow '-fcheck-pointers -flto' only when
> >>>checker testsuite has 100% pass rate with lto.
> >>
> >>No idea about why LTO would be failing in this way -- my experience with LTO
> >>is virtually nil.
> >>
> >>I'm OK with the restriction as a temporary measure.  But we really dislike
> >>this kidn of stuff, so I need a commitment from you to dive into these
> >>problems and sort out what's really going on.
> >
> >I'm not ok with adding such arbitrary restrictions.  LTO support should
> >be trivial.
> In  that case, Ilya, you've got to resolve the LTO issues before
> this can move forward.
> 
> jeff
> 

Yeah.  I'm working on it right now.  I've fixed known issues and now I'm 
looking for others.  Meanwhile here is a new patch version with required 
renames and without LTO restriction.

Thanks,
Ilya
--

gcc/

2013-10-29  Ilya Enkovich  

* builtin-types.def (BT_FN_VOID_CONST_PTR): New.
(BT_FN_PTR_CONST_PTR): New.
(BT_FN_CONST_PTR_CONST_PTR): New.
(BT_FN_PTR_CONST_PTR_SIZE): New.
(BT_FN_PTR_CONST_PTR_CONST_PTR): New.
(BT_FN_VOID_PTRPTR_CONST_PTR): New.
(BT_FN_VOID_CONST_PTR_SIZE): New.
(BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New.
* chkp-builtins.def: New.
* builtins.def: include chkp-builtins.def.
(DEF_CHKP_BUILTIN): New.
* builtins.c (expand_builtin): Support BUILT_IN_CHKP_INIT_PTR_BOUNDS,
BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS,
BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS,
BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS,
BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS,
BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND,
BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL,
BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET,
BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND, BUILT_IN_CHKP_NARROW,
BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER.
* common.opt (fcheck-pointer-bounds): New.
* toplev.c (process_options): Check Pointer Bounds Checker is supported.
* doc/extend.texi: Document Pointer Bounds Checker built-in functions.


diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 3deedba..1f9ae4e 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -226,6 +226,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT64_DFLOAT64, BT_DFLOAT64, 
BT_DFLOAT64)
 DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_VOID_CONST_PTR, BT_VOID, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_ULONG_ULONG, BT_ULONG, BT_ULONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_ULONGLONG_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG)
@@ -233,6 +234,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT16, BT_UINT16, 
BT_UINT16)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT32_UINT32, BT_UINT32, BT_UINT32)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT64_UINT64, BT_UINT64, BT_UINT64)
 DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
+DEF_FUNCTION_TYPE_1 (BT_FN_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_CONST_PTR_CONST_PTR, BT_CONST_PTR, BT_CONST_PTR)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR, BT_FN_VOID_PTR)
 
@@ -346,6 +349,10 @@ DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_SIZE_CONST_VPTR, BT_BOOL, 
BT_SIZE,
 BT_CONST_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_INT_BOOL, BT_BOOL, BT_INT, BT_BOOL)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT_UINT, BT_VOID, BT_UINT, BT_UINT)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_SIZE, BT_PTR, BT_CONST_PTR, BT_SIZE)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR, 
BT_CONST_PTR)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTRPTR_CONST_PTR, BT_VOID, BT_PTR_PTR, 
BT_CONST_PTR)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, BT_SIZE)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)
 
@@ -428,6 +435,7 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, 
BT_VOLATILE_PTR, BT_I2, BT
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, BT_I4, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I8_INT, BT_VOID, BT_VOLATILE_PTR, BT_I8, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_

Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP

2013-10-29 Thread Jakub Jelinek
On Tue, Oct 29, 2013 at 12:28:49PM +0100, Richard Biener wrote:
> Otherwise ok.

So like this?

2013-10-29  Jakub Jelinek  

* tree-cfg.c (assert_unreachable_fallthru_edge_p): New function.
* tree-cfg.h (assert_unreachable_fallthru_edge_p): New prototype.
* tree-vrp.c (all_imm_uses_in_stmt_or_feed_cond): New function.
(remove_range_assertions): If ASSERT_EXPR_VAR has no other immediate
uses but in the condition and ASSERT_EXPR and the other successor of
the predecessor bb is __builtin_unreachable (), set_range_info of the
ASSERT_EXPR_VAR to the range info of the ASSERT_EXPR's lhs.

--- gcc/tree-cfg.c.jj   2013-10-29 09:25:55.0 +0100
+++ gcc/tree-cfg.c  2013-10-29 14:32:53.235150267 +0100
@@ -380,6 +380,50 @@ computed_goto_p (gimple t)
  && TREE_CODE (gimple_goto_dest (t)) != LABEL_DECL);
 }
 
+/* Returns true for edge E where e->src ends with a GIMPLE_COND and
+   the other edge points to a bb with just __builtin_unreachable ().
+   I.e. return true for C->M edge in:
+   :
+   ...
+   if (something)
+ goto ;
+   else
+ goto ;
+   :
+   __builtin_unreachable ();
+   :  */
+
+bool
+assert_unreachable_fallthru_edge_p (edge e)
+{
+  basic_block pred_bb = e->src;
+  gimple last = last_stmt (pred_bb);
+  if (last && gimple_code (last) == GIMPLE_COND)
+{
+  basic_block other_bb = EDGE_SUCC (pred_bb, 0)->dest;
+  if (other_bb == e->dest)
+   other_bb = EDGE_SUCC (pred_bb, 1)->dest;
+  if (EDGE_COUNT (other_bb->succs) == 0)
+   {
+ gimple_stmt_iterator gsi = gsi_after_labels (other_bb);
+ gimple stmt;
+
+ if (gsi_end_p (gsi))
+   return false;
+ stmt = gsi_stmt (gsi);
+ if (is_gimple_debug (stmt))
+   {
+ gsi_next_nondebug (&gsi);
+ if (gsi_end_p (gsi))
+   return false;
+ stmt = gsi_stmt (gsi);
+   }
+ return gimple_call_builtin_p (stmt, BUILT_IN_UNREACHABLE);
+   }
+}
+  return false;
+}
+
 
 /* Search the CFG for any computed gotos.  If found, factor them to a
common computed goto site.  Also record the location of that site so
--- gcc/tree-vrp.c.jj   2013-10-29 09:25:38.382477963 +0100
+++ gcc/tree-vrp.c  2013-10-29 14:36:23.695060600 +0100
@@ -6459,6 +6459,33 @@ check_all_array_refs (void)
 }
 }
 
+/* Return true if all imm uses of VAR are either in STMT, or
+   feed (optionally through a chain of single imm uses) GIMPLE_COND
+   in basic block COND_BB.  */
+
+static bool
+all_imm_uses_in_stmt_or_feed_cond (tree var, gimple stmt, basic_block cond_bb)
+{
+  use_operand_p use_p, use2_p;
+  imm_use_iterator iter;
+
+  FOR_EACH_IMM_USE_FAST (use_p, iter, var)
+if (USE_STMT (use_p) != stmt)
+  {
+   gimple use_stmt = USE_STMT (use_p);
+   if (is_gimple_debug (use_stmt))
+ continue;
+   while (is_gimple_assign (use_stmt)
+  && single_imm_use (gimple_assign_lhs (use_stmt),
+ &use2_p, &use_stmt))
+ ;
+   if (gimple_code (use_stmt) != GIMPLE_COND
+   || gimple_bb (use_stmt) != cond_bb)
+ return false;
+  }
+  return true;
+}
+
 /* Convert range assertion expressions into the implied copies and
copy propagate away the copies.  Doing the trivial copy propagation
here avoids the need to run the full copy propagation pass after
@@ -6488,12 +6515,16 @@ remove_range_assertions (void)
 {
   basic_block bb;
   gimple_stmt_iterator si;
+  /* 1 if looking at ASSERT_EXPRs immediately at the beginning of
+ a basic block preceeded by GIMPLE_COND branching to it and
+ __builtin_trap, -1 if not yet checked, 0 otherwise.  */
+  int is_unreachable;
 
   /* Note that the BSI iterator bump happens at the bottom of the
  loop and no bump is necessary if we're removing the statement
  referenced by the current BSI.  */
   FOR_EACH_BB (bb)
-for (si = gsi_start_bb (bb); !gsi_end_p (si);)
+for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);)
   {
gimple stmt = gsi_stmt (si);
gimple use_stmt;
@@ -6501,6 +6532,7 @@ remove_range_assertions (void)
if (is_gimple_assign (stmt)
&& gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
  {
+   tree lhs = gimple_assign_lhs (stmt);
tree rhs = gimple_assign_rhs1 (stmt);
tree var;
tree cond = fold (ASSERT_EXPR_COND (rhs));
@@ -6509,22 +6541,49 @@ remove_range_assertions (void)
 
gcc_assert (cond != boolean_false_node);
 
-   /* Propagate the RHS into every use of the LHS.  */
var = ASSERT_EXPR_VAR (rhs);
-   FOR_EACH_IMM_USE_STMT (use_stmt, iter,
-  gimple_assign_lhs (stmt))
+   gcc_assert (TREE_CODE (var) == SSA_NAME);
+
+   if (!POINTER_TYPE_P (TREE_TYPE (lhs))
+   && SSA_NAME_RANGE_INFO (lhs))
+ {
+   i

Re: libsanitizer merge from upstream r191666

2013-10-29 Thread Konstantin Serebryany
On Tue, Oct 29, 2013 at 6:52 AM, Jakub Jelinek  wrote:
> On Tue, Oct 29, 2013 at 06:49:30AM -0700, Konstantin Serebryany wrote:
>> Thanks!
>> (At this time I will be slow with response due to travel)
>
> BTW, don't have compiled clang/llvm pre-3.4 around to look at, for the use
> after return, do you emit always the the __asan_*malloc/free calls for the
> stack vars, or only conditionally based on some compiler flag (and in the
> latter case, is that flag on by default or not)?

The calls are emitted by default, but the __asan_stack_malloc call is
done under a run-time flag
__asan_option_detect_stack_use_after_return.
So, to use the stack-use-after-return feature you should simply
compile with -fsanitize=address and then at run-time
pass ASAN_OPTIONS=detect_stack_use_after_return=1
For small stack frame sizes the call to __asan_stack_free is inlined
(as a performance optimization, not mandatory).

Here is how the IR looks like:

void bar(void*);
void foo() {
  char x[1];
  bar(&x);
}


define void @foo() #0 {
entry:
  %MyAlloca = alloca [10080 x i8], align 32
  %0 = ptrtoint [10080 x i8]* %MyAlloca to i64
  %1 = load i32* @__asan_option_detect_stack_use_after_return
  %2 = icmp ne i32 %1, 0
  br i1 %2, label %3, label %5

; :3   ; preds = %entry
  %4 = call i64 @__asan_stack_malloc_8(i64 10080, i64 %0)
  br label %5

; :5   ; preds = %entry, %3
  %6 = phi i64 [ %0, %entry ], [ %4, %3 ]
...
  call void @__asan_stack_free_8(i64 %6, i64 10080, i64 %0)
  ret void
}


So, when the UAR is disabled at run-time we pay a very small
additional price for this code
(less than 1% on avg on SPEC).
When it is enabled, we pay more:
https://code.google.com/p/address-sanitizer/wiki/UseAfterReturn#Performance

--kcc

>
> Jakub


Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP

2013-10-29 Thread Richard Biener
On Tue, 29 Oct 2013, Jakub Jelinek wrote:

> On Tue, Oct 29, 2013 at 12:28:49PM +0100, Richard Biener wrote:
> > Otherwise ok.
> 
> So like this?

Yes, Thanks.
Richard.

> 2013-10-29  Jakub Jelinek  
> 
>   * tree-cfg.c (assert_unreachable_fallthru_edge_p): New function.
>   * tree-cfg.h (assert_unreachable_fallthru_edge_p): New prototype.
>   * tree-vrp.c (all_imm_uses_in_stmt_or_feed_cond): New function.
>   (remove_range_assertions): If ASSERT_EXPR_VAR has no other immediate
>   uses but in the condition and ASSERT_EXPR and the other successor of
>   the predecessor bb is __builtin_unreachable (), set_range_info of the
>   ASSERT_EXPR_VAR to the range info of the ASSERT_EXPR's lhs.
> 
> --- gcc/tree-cfg.c.jj 2013-10-29 09:25:55.0 +0100
> +++ gcc/tree-cfg.c2013-10-29 14:32:53.235150267 +0100
> @@ -380,6 +380,50 @@ computed_goto_p (gimple t)
> && TREE_CODE (gimple_goto_dest (t)) != LABEL_DECL);
>  }
>  
> +/* Returns true for edge E where e->src ends with a GIMPLE_COND and
> +   the other edge points to a bb with just __builtin_unreachable ().
> +   I.e. return true for C->M edge in:
> +   :
> +   ...
> +   if (something)
> + goto ;
> +   else
> + goto ;
> +   :
> +   __builtin_unreachable ();
> +   :  */
> +
> +bool
> +assert_unreachable_fallthru_edge_p (edge e)
> +{
> +  basic_block pred_bb = e->src;
> +  gimple last = last_stmt (pred_bb);
> +  if (last && gimple_code (last) == GIMPLE_COND)
> +{
> +  basic_block other_bb = EDGE_SUCC (pred_bb, 0)->dest;
> +  if (other_bb == e->dest)
> + other_bb = EDGE_SUCC (pred_bb, 1)->dest;
> +  if (EDGE_COUNT (other_bb->succs) == 0)
> + {
> +   gimple_stmt_iterator gsi = gsi_after_labels (other_bb);
> +   gimple stmt;
> +
> +   if (gsi_end_p (gsi))
> + return false;
> +   stmt = gsi_stmt (gsi);
> +   if (is_gimple_debug (stmt))
> + {
> +   gsi_next_nondebug (&gsi);
> +   if (gsi_end_p (gsi))
> + return false;
> +   stmt = gsi_stmt (gsi);
> + }
> +   return gimple_call_builtin_p (stmt, BUILT_IN_UNREACHABLE);
> + }
> +}
> +  return false;
> +}
> +
>  
>  /* Search the CFG for any computed gotos.  If found, factor them to a
> common computed goto site.  Also record the location of that site so
> --- gcc/tree-vrp.c.jj 2013-10-29 09:25:38.382477963 +0100
> +++ gcc/tree-vrp.c2013-10-29 14:36:23.695060600 +0100
> @@ -6459,6 +6459,33 @@ check_all_array_refs (void)
>  }
>  }
>  
> +/* Return true if all imm uses of VAR are either in STMT, or
> +   feed (optionally through a chain of single imm uses) GIMPLE_COND
> +   in basic block COND_BB.  */
> +
> +static bool
> +all_imm_uses_in_stmt_or_feed_cond (tree var, gimple stmt, basic_block 
> cond_bb)
> +{
> +  use_operand_p use_p, use2_p;
> +  imm_use_iterator iter;
> +
> +  FOR_EACH_IMM_USE_FAST (use_p, iter, var)
> +if (USE_STMT (use_p) != stmt)
> +  {
> + gimple use_stmt = USE_STMT (use_p);
> + if (is_gimple_debug (use_stmt))
> +   continue;
> + while (is_gimple_assign (use_stmt)
> +&& single_imm_use (gimple_assign_lhs (use_stmt),
> +   &use2_p, &use_stmt))
> +   ;
> + if (gimple_code (use_stmt) != GIMPLE_COND
> + || gimple_bb (use_stmt) != cond_bb)
> +   return false;
> +  }
> +  return true;
> +}
> +
>  /* Convert range assertion expressions into the implied copies and
> copy propagate away the copies.  Doing the trivial copy propagation
> here avoids the need to run the full copy propagation pass after
> @@ -6488,12 +6515,16 @@ remove_range_assertions (void)
>  {
>basic_block bb;
>gimple_stmt_iterator si;
> +  /* 1 if looking at ASSERT_EXPRs immediately at the beginning of
> + a basic block preceeded by GIMPLE_COND branching to it and
> + __builtin_trap, -1 if not yet checked, 0 otherwise.  */
> +  int is_unreachable;
>  
>/* Note that the BSI iterator bump happens at the bottom of the
>   loop and no bump is necessary if we're removing the statement
>   referenced by the current BSI.  */
>FOR_EACH_BB (bb)
> -for (si = gsi_start_bb (bb); !gsi_end_p (si);)
> +for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);)
>{
>   gimple stmt = gsi_stmt (si);
>   gimple use_stmt;
> @@ -6501,6 +6532,7 @@ remove_range_assertions (void)
>   if (is_gimple_assign (stmt)
>   && gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
> {
> + tree lhs = gimple_assign_lhs (stmt);
>   tree rhs = gimple_assign_rhs1 (stmt);
>   tree var;
>   tree cond = fold (ASSERT_EXPR_COND (rhs));
> @@ -6509,22 +6541,49 @@ remove_range_assertions (void)
>  
>   gcc_assert (cond != boolean_false_node);
>  
> - /* Propagate the RHS into every use of the LHS.  */
>   var = ASSERT_EXPR_VAR (rhs);
> - FOR_EACH_IMM_USE_STMT (use_stmt, iter,
> -

[PATCH] More restrict testcases

2013-10-29 Thread Richard Biener

Just figured while "optimizing" Michas ADD_RESTRICT patch ...
(well, make it "work").

Tested on x86_64-unknown-linux-gnu, committed.

Richard.

2013-10-29  Richard Biener  

* gcc.dg/torture/restrict-2.c: New testcase.
* gcc.dg/torture/restrict-3.c: Likewise.
* gcc.dg/torture/restrict-4.c: Likewise.
* gcc.dg/torture/restrict-5.c: Likewise.

Index: gcc/testsuite/gcc.dg/torture/restrict-2.c
===
--- gcc/testsuite/gcc.dg/torture/restrict-2.c   (revision 0)
+++ gcc/testsuite/gcc.dg/torture/restrict-2.c   (working copy)
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+static inline void
+foo (int * p)
+{
+  int * __restrict pr = p;
+  *pr = 1;
+}
+
+int __attribute__((noinline,noclone))
+bar (int *q)
+{
+  int * __restrict qr = q;
+  *qr = 0;
+  foo (qr);
+  return *qr;
+}
+
+int main()
+{
+  int i;
+  if (bar (&i) != 1)
+abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/restrict-3.c
===
--- gcc/testsuite/gcc.dg/torture/restrict-3.c   (revision 0)
+++ gcc/testsuite/gcc.dg/torture/restrict-3.c   (working copy)
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+static inline void
+foo (int * __restrict pr)
+{
+  *pr = 1;
+}
+
+int __attribute__((noinline,noclone))
+bar (int *q)
+{
+  int * __restrict qr = q;
+  *qr = 0;
+  foo (qr);
+  return *qr;
+}
+
+int main()
+{
+  int i;
+  if (bar (&i) != 1)
+abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/restrict-4.c
===
--- gcc/testsuite/gcc.dg/torture/restrict-4.c   (revision 0)
+++ gcc/testsuite/gcc.dg/torture/restrict-4.c   (working copy)
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+static inline void
+foo (int * p)
+{
+  int * __restrict pr = p;
+  *pr = 1;
+}
+
+int __attribute__((noinline,noclone))
+bar (int * __restrict qr)
+{
+  *qr = 0;
+  foo (qr);
+  return *qr;
+}
+
+int main()
+{
+  int i;
+  if (bar (&i) != 1)
+abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/restrict-5.c
===
--- gcc/testsuite/gcc.dg/torture/restrict-5.c   (revision 0)
+++ gcc/testsuite/gcc.dg/torture/restrict-5.c   (working copy)
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+static inline void
+foo (int * __restrict pr)
+{
+  *pr = 1;
+}
+
+int __attribute__((noinline,noclone))
+bar (int * __restrict qr)
+{
+  *qr = 0;
+  foo (qr);
+  return *qr;
+}
+
+int main()
+{
+  int i;
+  if (bar (&i) != 1)
+abort ();
+  return 0;
+}


[PATCH, 4.8, PR 58789] Backport cgraph_get_create_real_symbol_node and PR 57084 fix

2013-10-29 Thread Martin Jambor
Hi,

PR 58789 has been fixed on trunk by revision 198743 which needs
cgraph_get_create_real_symbol_node introduced in revision 196750.
This patch backports both.  It has been pre-approved by Honza in
person and passed bootstrap and testing on the branch.  I am about
to commit it momentarily.

Thanks,

Martin


2013-10-25  Martin Jambor  

PR middle-end/58789
Backport from mainline
2013-05-09  Martin Jambor  

PR lto/57084
* gimple-fold.c (canonicalize_constructor_val): Call
cgraph_get_create_real_symbol_node instead of cgraph_get_create_node.

Backport from mainline
2013-03-16  Jan Hubicka  

* cgraph.h (cgraph_get_create_real_symbol_node): Declare.
* cgraph.c (cgraph_get_create_real_symbol_node): New function.
* cgrpahbuild.c: Use cgraph_get_create_real_symbol_node instead
of cgraph_get_create_node.
* ipa-prop.c (ipa_make_edge_direct_to_target): Likewise.


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 8c1efb4..fd3aade 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2596,4 +2596,47 @@ verify_cgraph (void)
   FOR_EACH_FUNCTION (node)
 verify_cgraph_node (node);
 }
+
+/* Create external decl node for DECL.
+   The difference i nbetween cgraph_get_create_node and
+   cgraph_get_create_real_symbol_node is that cgraph_get_create_node
+   may return inline clone, while cgraph_get_create_real_symbol_node
+   will create a new node in this case.
+   FIXME: This function should be removed once clones are put out of decl
+   hash.  */
+
+struct cgraph_node *
+cgraph_get_create_real_symbol_node (tree decl)
+{
+  struct cgraph_node *first_clone = cgraph_get_node (decl);
+  struct cgraph_node *node;
+  /* create symbol table node.  even if inline clone exists, we can not take
+ it as a target of non-inlined call.  */
+  node = cgraph_get_node (decl);
+  if (node && !node->global.inlined_to)
+return node;
+
+  node = cgraph_create_node (decl);
+
+  /* ok, we previously inlined the function, then removed the offline copy and
+ now we want it back for external call.  this can happen when 
devirtualizing
+ while inlining function called once that happens after extern inlined and
+ virtuals are already removed.  in this case introduce the external node
+ and make it available for call.  */
+  if (first_clone)
+{
+  first_clone->clone_of = node;
+  node->clones = first_clone;
+  symtab_prevail_in_asm_name_hash ((symtab_node) node);
+  symtab_insert_node_to_hashtable ((symtab_node) node);
+  if (dump_file)
+   fprintf (dump_file, "Introduced new external node "
+"(%s/%i) and turned into root of the clone tree.\n",
+xstrdup (cgraph_node_name (node)), node->uid);
+}
+  else if (dump_file)
+fprintf (dump_file, "Introduced new external node "
+"(%s/%i).\n", xstrdup (cgraph_node_name (node)), node->uid);
+  return node;
+}
 #include "gt-cgraph.h"
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 5df7fb4..8ab7ae1 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -575,6 +575,7 @@ struct cgraph_indirect_call_info 
*cgraph_allocate_init_indirect_info (void);
 struct cgraph_node * cgraph_create_node (tree);
 struct cgraph_node * cgraph_create_empty_node (void);
 struct cgraph_node * cgraph_get_create_node (tree);
+struct cgraph_node * cgraph_get_create_real_symbol_node (tree);
 struct cgraph_node * cgraph_same_body_alias (struct cgraph_node *, tree, tree);
 struct cgraph_node * cgraph_add_thunk (struct cgraph_node *, tree, tree, bool, 
HOST_WIDE_INT,
   HOST_WIDE_INT, tree, tree);
diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c
index fb01f24..a74a4c0 100644
--- a/gcc/cgraphbuild.c
+++ b/gcc/cgraphbuild.c
@@ -73,7 +73,7 @@ record_reference (tree *tp, int *walk_subtrees, void *data)
   decl = get_base_var (*tp);
   if (TREE_CODE (decl) == FUNCTION_DECL)
{
- struct cgraph_node *node = cgraph_get_create_node (decl);
+ struct cgraph_node *node = cgraph_get_create_real_symbol_node (decl);
  if (!ctx->only_vars)
cgraph_mark_address_taken_node (node);
  ipa_record_reference ((symtab_node)ctx->varpool_node,
@@ -143,7 +143,7 @@ record_eh_tables (struct cgraph_node *node, struct function 
*fun)
 {
   struct cgraph_node *per_node;
 
-  per_node = cgraph_get_create_node (DECL_FUNCTION_PERSONALITY 
(node->symbol.decl));
+  per_node = cgraph_get_create_real_symbol_node (DECL_FUNCTION_PERSONALITY 
(node->symbol.decl));
   ipa_record_reference ((symtab_node)node, (symtab_node)per_node, 
IPA_REF_ADDR, NULL);
   cgraph_mark_address_taken_node (per_node);
 }
@@ -223,7 +223,7 @@ mark_address (gimple stmt, tree addr, void *data)
   addr = get_base_address (addr);
   if (TREE_CODE (addr) == FUNCTION_DECL)
 {
-  struct cgraph_node *node = cgraph_get_create_node (addr);
+  struct cgraph_node *node = cgraph_get_create_

Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2

2013-10-29 Thread Sandra Loosemore

On 10/29/2013 02:51 AM, Bernd Edlinger wrote:

On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote:


I again tried backporting the patch series along with your fix to GCC
4.8 and tested on arm-none-eabi. I found that it was still getting
stuck in infinite recursion unless the test from this patch hunk



hmmm
Actually I used your path on a clean 4.8.2 and built for --target=arm-eabi.
I have seen the recursion on the extract_bit_field, but not on store_bit_field
so far, maybe you could give me a hint which test case exposes the other
flavour of this recursion problem.


Sorry, I was going to describe that in more detail but I forgot.  It was 
compiling pr56997-1.c with -mthumb, like:


arm-none-eabi-gcc /path/to/gcc/testsuite/gcc.dg/pr56997-1.c -c 
-fno-diagnostics-show-caret   -fstrict-volatile-bitfields 
-DSTACK_SIZE=16384 -mthumb


Other testing with -mthumb looked fine.


Either way, still needs more work, and more thorough testing (more
targets, and obviously trunk as well as the 4.8 backport) before I'd
consider this ready to commit. I might or might not be able to find
some more time to work on this in the next week


Yes.

And it would be good to reach Richard's Nov-21 deadline for GCC-4.9


Yeah, I know the clock is ticking.  :-(

-Sandra



Re: [Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops

2013-10-29 Thread Jason Merrill

On 10/28/2013 05:48 PM, Tobias Burnus wrote:

I am not completely sure whether you had the following in mind, but
that's what I have now implemented:
   DEFTREECODE (RANGE_FOR_STMT, "range_for_stmt", tcc_statement, 4)
has now a 5th operator (RANGE_FOR_IVDEP), which has the value
boolean_true_node - or NULL_TREE (via begin_range_for_stmt).


You don't need to add an operand, you can use one of the TREE_LANG_FLAGs 
for a boolean flag.


Jason



Re: [C++ Patch] PR 58888

2013-10-29 Thread Jason Merrill

OK.

Jason


Re: [PATCH] Compute nonzero_bits from __builtin_unreachable assertions

2013-10-29 Thread Richard Biener
On Sat, 26 Oct 2013, Jakub Jelinek wrote:

> Hi!
> 
> And here is a patch that allows vectorization without peeling for alignment
> and scalar loop for bound even for fn2, fn3 and fn4 in the following
> testcase, though as with the range __builtin_unreachable () notes, it is
> quite fragile, because it only works if there are no immediate uses of the
> tested SSA_NAME before the assertion.  Perhaps more reliable way would be to
> convert those assertions info __builtin_assume_aligned, but that has the
> disadvantage that it's first argument is a pointer and it returns a pointer,
> so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED
> internal function.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux.

Ok.

Thanks,
Richard.

> int a[1024];
> 
> void
> fn1 (int x, int y)
> {
>   int i;
>   x &= -32;
>   y &= -32;
>   for (i = x + 32; i < y; i++)
> a[i]++;
> }
> 
> void
> fn2 (int x, int y)
> {
>   int i;
>   if (x & 31)
> __builtin_unreachable ();
>   if (y & 31)
> __builtin_unreachable ();
>   for (i = x + 32; i < x + y; i++)
> a[i]++;
> }
> 
> void
> fn3 (int x, int y)
> {
>   int i;
>   if (x % 32)
> __builtin_unreachable ();
>   if (y % 32)
> __builtin_unreachable ();
>   for (i = x + 32; i < x + y; i++)
> a[i]++;
> }
> 
> void
> fn3 (int x, int y)
> {
>   int i;
>   if ((x % 32) != 0)
> __builtin_unreachable ();
>   if ((y % 32) != 0)
> __builtin_unreachable ();
>   for (i = x + 32; i < x + y; i++)
> a[i]++;
> }
> 
> 2013-10-25  Jakub Jelinek  
> 
>   * tree-vrp.c (maybe_set_nonzero_bits): New function.
>   (remove_range_assertions): Call it.
> 
> --- gcc/tree-vrp.c.jj 2013-10-24 14:32:29.0 +0200
> +++ gcc/tree-vrp.c2013-10-25 21:21:35.183092937 +0200
> @@ -6459,6 +6459,60 @@ check_all_array_refs (void)
>  }
>  }
>  
> +/* Handle
> +   _4 = x_3 & 31;
> +   if (_4 != 0)
> + goto ;
> +   else
> + goto ;
> +   :
> +   __builtin_unreachable ();
> +   :
> +   x_5 = ASSERT_EXPR ;
> +   If x_3 has no other immediate uses (checked by caller),
> +   var is the x_3 var from ASSERT_EXPR, we can clear low 5 bits
> +   from the non-zero bitmask.  */
> +
> +static void
> +maybe_set_nonzero_bits (basic_block bb, tree var)
> +{
> +  edge e = single_pred_edge (bb);
> +  basic_block cond_bb = e->src;
> +  gimple stmt = last_stmt (cond_bb);
> +  tree cst;
> +
> +  if (stmt == NULL
> +  || gimple_code (stmt) != GIMPLE_COND
> +  || gimple_cond_code (stmt) != ((e->flags & EDGE_TRUE_VALUE)
> +  ? EQ_EXPR : NE_EXPR)
> +  || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME
> +  || !integer_zerop (gimple_cond_rhs (stmt)))
> +return;
> +
> +  stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt));
> +  if (!is_gimple_assign (stmt)
> +  || gimple_assign_rhs_code (stmt) != BIT_AND_EXPR
> +  || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST)
> +return;
> +  if (gimple_assign_rhs1 (stmt) != var)
> +{
> +  gimple stmt2;
> +
> +  if (TREE_CODE (gimple_assign_rhs1 (stmt)) != SSA_NAME)
> + return;
> +  stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
> +  if (!gimple_assign_cast_p (stmt2)
> +   || gimple_assign_rhs1 (stmt2) != var
> +   || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
> +   || (TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt)))
> +   != TYPE_PRECISION (TREE_TYPE (var
> + return;
> +}
> +  cst = gimple_assign_rhs2 (stmt);
> +  set_nonzero_bits (var, (get_nonzero_bits (var)
> +   & ~tree_to_double_int (cst)));
> +}
> +
>  /* Convert range assertion expressions into the implied copies and
> copy propagate away the copies.  Doing the trivial copy propagation
> here avoids the need to run the full copy propagation pass after
> @@ -6576,8 +6630,11 @@ remove_range_assertions (void)
>   }
>   }
>   if (ok)
> -   set_range_info (var, SSA_NAME_RANGE_INFO (lhs)->min,
> -   SSA_NAME_RANGE_INFO (lhs)->max);
> +   {
> + set_range_info (var, SSA_NAME_RANGE_INFO (lhs)->min,
> + SSA_NAME_RANGE_INFO (lhs)->max);
> + maybe_set_nonzero_bits (bb, var);
> +   }
> }
> }
>  
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend


Re: [PATCH] Testcase for nonzero_bits & __builtin_unreachable

2013-10-29 Thread Richard Biener
On Tue, 29 Oct 2013, Jakub Jelinek wrote:

> On Sat, Oct 26, 2013 at 12:19:33AM +0200, Jakub Jelinek wrote:
> > And here is a patch that allows vectorization without peeling for alignment
> > and scalar loop for bound even for fn2, fn3 and fn4 in the following
> > testcase, though as with the range __builtin_unreachable () notes, it is
> > quite fragile, because it only works if there are no immediate uses of the
> > tested SSA_NAME before the assertion.  Perhaps more reliable way would be to
> > convert those assertions info __builtin_assume_aligned, but that has the
> > disadvantage that it's first argument is a pointer and it returns a pointer,
> > so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED
> > internal function.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> And here is as a follow-up to the whole patchset of 4 patches a testcase:

Ok.

Thanks,
Richard.

> 2013-10-29  Jakub Jelinek  
> 
>   * gcc.dg/vect/vect-align-3.c: New test.
> 
> --- gcc/testsuite/gcc.dg/vect/vect-align-3.c.jj   2013-10-29 
> 08:18:08.636348586 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-align-3.c  2013-10-29 08:21:14.931384138 
> +0100
> @@ -0,0 +1,54 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int a[2048];
> +
> +void
> +f1 (int x, int y)
> +{
> +  int i;
> +  x &= -256;
> +  y &= -256;
> +  for (i = x + 256; i < y; i++)
> +a[i]++;
> +}
> +
> +void
> +f2 (int x, int y)
> +{
> +  int i;
> +  if (x & 31)
> +__builtin_unreachable ();
> +  if (y & 31)
> +__builtin_unreachable ();
> +  for (i = x + 256; i < x + y; i++)
> +a[i]++;
> +}
> +
> +void
> +f3 (int x, int y)
> +{
> +  int i;
> +  if (x % 256)
> +__builtin_unreachable ();
> +  if (y % 256)
> +__builtin_unreachable ();
> +  for (i = x + 256; i < x + y; i++)
> +a[i]++;
> +}
> +
> +void
> +f4 (int x, int y)
> +{
> +  int i;
> +  if ((x % 256) != 0)
> +__builtin_unreachable ();
> +  if ((y % 256) != 0)
> +__builtin_unreachable ();
> +  for (i = x + 256; i < x + y; i++)
> +a[i]++;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "vect_do_peeling_for_loop_bound" "vect" } 
> } */
> +/* { dg-final { scan-tree-dump-not "loop peeled for vectorization" "vect" } 
> } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend


[PATCH] Fix PR 58867: asan and ubsan tests not run for installed testing

2013-10-29 Thread Andrew Pinski
Hi,
  The problem here is that both asan and ubsan testsuite test if we
have set a library path before running the testsuite.  This is
incorrect when running the already installed testing as there is no
path to set.

This patch changes it so it tests if the executable is able to be
built/linked before running the tests.  This also fixes a
warning/error in restoring the CXXFLAGS which shows up in the already
installed testing case.

OK?  Bootstrapped and tested on x86_64-linux-gnu.  Also manually
looked into the log file to make sure both asan and ubsan were still
being tested.

Thanks,
Andrew Pinski

ChangeLog:
* lib/ubsan-dg.exp (check_effective_target_fundefined_sanitizer): New function.
(ubsan_init): Save off ALWAYS_CXXFLAGS.
(ubsan_finish): Restore ALWAYS_CXXFLAGS correctly.
* lib/asan-dg.exp (check_effective_target_faddress_sanitizer): Change
to creating an executable.
(asan_init): Save off ALWAYS_CXXFLAGS.
(asan_finish): Restore ALWAYS_CXXFLAGS correctly.
* gcc.dg/ubsan/ubsan.exp: Don't check the return value of ubsan_init.
Check check_effective_target_fundefined_sanitizer before running the
tests.
* g++.dg/ubsan/ubsan.exp: Likewise.
* testsuite/gcc.dg/asan/asan.exp: Don't check
check_effective_target_faddress_sanitizer too early. Don't check the
return value of asan_init.
* g++.dg/asan/asan.exp: Likewise.
Index: testsuite/lib/ubsan-dg.exp
===
--- testsuite/lib/ubsan-dg.exp  (revision 204138)
+++ testsuite/lib/ubsan-dg.exp  (working copy)
@@ -14,6 +14,15 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
+# Return 1 if compilation with -fsanitize=undefined is error-free for trivial
+# code, 0 otherwise.
+
+proc check_effective_target_fundefined_sanitizer {} {
+return [check_no_compiler_messages fundefined_sanitizer executable {
+   int main (void) { return 0; }
+} "-fsanitize=undefined"]
+}
+
 #
 # ubsan_link_flags -- compute library path and flags to find libubsan.
 # (originally from g++.exp)
@@ -60,6 +69,7 @@ proc ubsan_init { args } {
 global ALWAYS_CXXFLAGS
 global TOOL_OPTIONS
 global ubsan_saved_TEST_ALWAYS_FLAGS
+global usan_saved_ALWAYS_CXXFLAGS
 
 set link_flags ""
 if ![is_remote host] {
@@ -74,6 +84,7 @@ proc ubsan_init { args } {
set ubsan_saved_TEST_ALWAYS_FLAGS $TEST_ALWAYS_FLAGS
 }
 if [info exists ALWAYS_CXXFLAGS] {
+   set usan_saved_ALWAYS_CXXFLAGS $ALWAYS_CXXFLAGS
set ALWAYS_CXXFLAGS [concat "{ldflags=$link_flags}" $ALWAYS_CXXFLAGS]
 } else {
if [info exists TEST_ALWAYS_FLAGS] {
@@ -95,10 +106,15 @@ proc ubsan_init { args } {
 proc ubsan_finish { args } {
 global TEST_ALWAYS_FLAGS
 global ubsan_saved_TEST_ALWAYS_FLAGS
+global asan_saved_ALWAYS_CXXFLAGS
 
-if [info exists ubsan_saved_TEST_ALWAYS_FLAGS] {
-   set TEST_ALWAYS_FLAGS $ubsan_saved_TEST_ALWAYS_FLAGS
+if [info exists asan_saved_ALWAYS_CXXFLAGS ] {
+   set ALWAYS_CXXFLAGS $asan_saved_ALWAYS_CXXFLAGS
 } else {
-   unset TEST_ALWAYS_FLAGS
+   if [info exists ubsan_saved_TEST_ALWAYS_FLAGS] {
+   set TEST_ALWAYS_FLAGS $ubsan_saved_TEST_ALWAYS_FLAGS
+   } else {
+   unset TEST_ALWAYS_FLAGS
+   }
 }
 }
Index: testsuite/lib/asan-dg.exp
===
--- testsuite/lib/asan-dg.exp   (revision 204138)
+++ testsuite/lib/asan-dg.exp   (working copy)
@@ -18,8 +18,8 @@
 # code, 0 otherwise.
 
 proc check_effective_target_faddress_sanitizer {} {
-return [check_no_compiler_messages faddress_sanitizer object {
-   void foo (void) { }
+return [check_no_compiler_messages faddress_sanitizer executable {
+   int main (void) { return 0; }
 } "-fsanitize=address"]
 }
 
@@ -69,6 +69,7 @@ proc asan_init { args } {
 global ALWAYS_CXXFLAGS
 global TOOL_OPTIONS
 global asan_saved_TEST_ALWAYS_FLAGS
+global asan_saved_ALWAYS_CXXFLAGS
 
 set link_flags ""
 if ![is_remote host] {
@@ -83,6 +84,7 @@ proc asan_init { args } {
set asan_saved_TEST_ALWAYS_FLAGS $TEST_ALWAYS_FLAGS
 }
 if [info exists ALWAYS_CXXFLAGS] {
+   set asan_saved_ALWAYS_CXXFLAGS $ALWAYS_CXXFLAGS
set ALWAYS_CXXFLAGS [concat "{ldflags=$link_flags}" $ALWAYS_CXXFLAGS]
set ALWAYS_CXXFLAGS [concat "{additional_flags=-fsanitize=address -g}" 
$ALWAYS_CXXFLAGS]
 } else {
@@ -105,11 +107,16 @@ proc asan_init { args } {
 proc asan_finish { args } {
 global TEST_ALWAYS_FLAGS
 global asan_saved_TEST_ALWAYS_FLAGS
+global asan_saved_ALWAYS_CXXFLAGS
 
-if [info exists asan_saved_TEST_ALWAYS_FLAGS] {
-   set TEST_ALWAYS_FLAGS $asan_saved_TEST_ALWAYS_FLAGS
+if [info exists asan_saved_ALWAYS_CXXFLAGS ] {
+   set ALWAYS_CXXFLAGS $asan_saved_ALWAYS_CXXFLAGS
 } else {
-   unset TEST_ALWAYS_FLAGS
+   if [info exists asan_saved_TEST_ALWAYS_FLAGS] {
+   set TEST_ALWAYS_FLAGS $asan_s

Re: [PATCH] Do not append " *INTERNAL* " to the decl name

2013-10-29 Thread Jason Merrill

On 10/28/2013 06:12 PM, Dehao Chen wrote:

ping...


Sorry for the slow response.

If we're actually emitting the name now, we need to give it a name 
different from the complete constructor.  I suppose it makes sense to go 
with C4/D4 as in the decloning patch,


http://gcc.gnu.org/ml/gcc-patches/2007-11/txt00046.txt

(which I've been meaning to revisit before the end of stage 1).

Jason



Re: free is a killer

2013-10-29 Thread Jeff Law

On 10/29/13 04:40, Richard Biener wrote:


Of course in the example we have the "global memory" storage class
(incoming function argument) and "malloc memory" which is really
the same storage class.  It only becomes a different storage class
if you factor in flow analysis (for which the current PTA code is weak
once you leave the SSA web).
The global memory storage class is really better thought of as "i 
haven't a clue" class.


You want a different class for things passed in as you know they can't 
conflict with anything in the current function's stack or anything 
malloc'd by the current function.



Jeff


Re: [PATCH 1/2] fix PR49847 ICE-on-HAVE_cc0 regression from PR50780 changes

2013-10-29 Thread Richard Biener
On Wed, Aug 28, 2013 at 10:06 PM, Jeff Law  wrote:
> On 08/28/2013 03:50 AM, Mikael Pettersson wrote:
>>
>> This patch fixes an ICE that occurs in #ifdef HAVE_cc0 code.  The ICE
>> breaks both Java and Ada bootstrap on m68k-linux.  There is also a
>> tiny C++ test case in the BZ entry.
>>
>> The ICE is triggered by the PR middle-end/50780 changes in r180192.
>> As explained in ,
>> the effect of those changes is that an expression in EH context is
>> broken up so that the cc0 setter and user are put in separate BBs, which
>> normally isn't allowed.  As fold_rtx sees the cc0 user, it calls
>> equiv_constant on prev_insn_cc0, but that is NULL due to the BB boundary,
>> resulting in the ICE.
>>
>> This patch checks if prev_insn_cc0 is NULL, and if so doesn't call
>> equiv_constant but sets const_arg to zero, which avoids the ICE and
>> makes fold_rtx leave the insn unchanged.
>>
>> Bootstrapped and regtested on m68k-linux, no regressions.  This patch
>> has been in 4.6-based production compilers on m68k-linux since early 2012,
>> and in a 4.7-based compiler since early 2013.
>>
>> Ok for trunk and 4.8?
>>
>> [If approved I'll need help to commit it as I don't have commit rights.]
>>
>> gcc/
>>
>> 2013-08-28  Mikael Pettersson  
>>
>> PR rtl-optimization/49847
>> * cse.c (fold_rtx) : If prev_insn_cc0 is NULL
>> don't call equiv_constant on it.
>
> I can't help but feel something different needs to be done here.   It's
> always been the case that those two insns are expected to be contiguous and
> there's going to be code all over the place that makes that assumption and
> that model is what every GCC developer has expected for the last 20+ years.
>
> What precisely about Richi's patch causes the setter and user to end up in
> different blocks?  I'm guessing we now consider the comparison itself as
> potentially trapping and thus we end the block immediately?  The consumer
> then appears in the next block?

Btw, the handling for COND_EXPRs was simply adjusted to be the same
as for if () conditions.

>From looking at the testcase,

int f (float g)
{
  try { return g >= 0; }
  catch (...) {}
}

can't you simply avoid expanding throwing stuff to a set-reg-based-on-compare
instruction sequence?

Or why does the issue not trigger on

 try { if (g >= 0) return 1; else 0; } catch (...) {}

?

Richard.

> I wonder if the comparison could be duplicated and marked as non-throwing at
> RTL generation time.   Alternately (and probably better) would be to convert
> the remaining stragglers and drop the old cc0 support entirely.
>
> jeff
>
>


Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-29 Thread Andrew Pinski
On Mon, Oct 28, 2013 at 10:51 PM, Jeff Law  wrote:
> On 10/27/13 12:55, Andrew Pinski wrote:
>>
>> Here is my latest patch which adds the testcases from Zhenqiang's
>> patch and fixes item 1 and 2.
>>
>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
>> (ifcombine_ifandif): Handle cases where
>> maybe_fold_and_comparisons fails, combining the branches
>> anyways.
>> (tree_ssa_ifcombine): Inverse the order of
>> the basic block walk, increases the number of combinings.
>> * gimple.h (gsi_start_nondebug_after_labels_bb): New function.
>>
>>
>> testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
>> * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
>>
>> * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
>> conditional move to be used.
>> * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
>> intermediate".
>
> I don't be able to look at this in any depth tonight.  However, I would ask
> that you pass along the dump file for ssa-dom-thread-3.c so that I can
> evaluate the correctness of your change better.

We no longer jump thread one case as we have:

  :
  var_3 = var_1(D)->ssa_name.var;
  _4 = var_1(D)->base.code;
  if (_4 == 1)
goto ;
  else
goto ;

...

 :
  _6 = var_3->base.code;
  _13 = _4 != 1;
  _14 = _6 != 0;
  _12 = _13 & _14;
  if (_12 != 0)
goto ;
  else
goto ;

Attached is the full dump too.

Thanks,
Andrew Pinski



>
> Thanks,
> jeff
>


ssa-dom-thread-3.c.077t.dom1
Description: Binary data


Re: free is a killer

2013-10-29 Thread Richard Biener
On Tue, Oct 29, 2013 at 4:01 PM, Jeff Law  wrote:
> On 10/29/13 04:40, Richard Biener wrote:
>>
>>
>> Of course in the example we have the "global memory" storage class
>> (incoming function argument) and "malloc memory" which is really
>> the same storage class.  It only becomes a different storage class
>> if you factor in flow analysis (for which the current PTA code is weak
>> once you leave the SSA web).
>
> The global memory storage class is really better thought of as "i haven't a
> clue" class.

No, that's the "anything" class in the current PTA.

> You want a different class for things passed in as you know they can't
> conflict with anything in the current function's stack or anything malloc'd
> by the current function.

That's the NONLOCAL class in the current PTA.  Which also means
that simple cases should work.  It's only when the malloc result
escapes somewhere that it gets reachable by NONLOCAL because
of the limited flow-sensitiveness of PTA (and its lack of context sensitivity).

Richard.



>
> Jeff


Re: [wide-int] More optimisations

2013-10-29 Thread Kenneth Zadeck

On 10/29/2013 08:43 AM, Richard Sandiford wrote:

Richard Biener  writes:

On Sun, 27 Oct 2013, Richard Sandiford wrote:

This patch adds some more optimisations to the wi:: comparison functions.
It uses the:

   #define CONSTANT(X) (__builtin_constant_p (X) && (X))

idiom that was mentioned before, except that I thought CONSTANT would be
too easily confused with CONSTANT_P, so I went for CAN_TELL instead.
Better names welcome.

STATIC_CONSTANT_P similar to static_assert?

Sounds good.


The changes are:

- Add a fast path to eq_p for when one of the inputs isn't sign-extended.
   This includes code to handle compile-time 0 specially.

- Add the opposite optimisation to Mike's lts_p change, if we can tell at
   compile time that it applies.

I think the cases Mike added should only be enabled when we can figure
them out at compile-time, too.

Well, the idea with most of these functions was to handle the simple
"everything is one HWI" cases inline.  For operations like addition this
needs to be based on the precision, since that's the only cheap way of
knowing whether the result is also a single HWI.  But for logic operations
like AND and OR, it can be based on whether the inputs are single HWIs,
since single-HWI inputs give a single-HWI output.  This catches more cases.

lts_p is a bit like AND and OR in this respect.  fits_shwi_p is the
same as "len == 1", which is simple to check and is often true.  If we
restrict the optimisation to STATIC_CONSTANT_P we'll end up using the
out-of-line comparison for all TREE_INT_CST_LTs, even the int-int ones
that we currently handle inline.


This has always been my way of looking at it.

kenny

Thanks,
Richard





Re: [PATCH] Introduce [sg]et_nonzero_bits

2013-10-29 Thread Jakub Jelinek
On Tue, Oct 29, 2013 at 12:55:04PM +0100, Richard Biener wrote:
> Surely you can't rely on CCP and VRP compute exactly the same
> nonzero_bits.  As you don't record/compute zero_bits you can't
> tell whether a not set bit in nonzer_bits is "don't know" or
> if it is "zero".  And you cannot do an assert during iteration
> (during iteration optimistically 'wrong' values can disappear).
> 
> During CCP iteration the rule is that bits may only be added to mask
> (and obviously the constant itself on a CONSTANT lattice value may
> not change).

> Factor this out to commonize with code in evaluate_stmt

Ok.

> > + tem = val->mask.low;
> > + align = (tem & -tem);
> > + if (align > 1)
> > +   set_ptr_info_alignment (get_ptr_info (name), align,
> > +   (TREE_INT_CST_LOW (val->value)
> > +& (align - 1)));
> > +   }
> > +  else
> > +   {
> > + double_int nonzero_bits = val->mask;
> > + nonzero_bits = nonzero_bits | tree_to_double_int (val->value);
> > + nonzero_bits &= get_nonzero_bits (name);
> 
> This looks odd to me - shouldn't it be simply
> 
>nonzero_bits = ~val->mask & tree_to_double_int (val->value);

Bits set in the mask are for the bits where the value is unknown,
so potentially nonzero bits.  Where bits are clear in the mask,
but set in the value, those are surely nonzero bits.
The only known zero bits are where both mask and value are zero,
and as nonzero_bits is defined to be 1 where the bit may be non-zero,
0 where it must be zero (like it is for nonzero_bits* in RTL),
I think val->mask | tree_to_double_int (val->value); is exactly
what we want.

> ?  &= of get_nonzero_bits either is not necessary or shows the
> lattice is unnecessarily conservative because you apply it too early
> during propagation?
> 
> > + set_nonzero_bits (name, nonzero_bits);
> > +   }
> >  }
> >  
> >/* Perform substitutions based on the known constant values.  */
> > @@ -1671,6 +1700,25 @@ evaluate_stmt (gimple stmt)
> >is_constant = (val.lattice_val == CONSTANT);
> >  }
> >  
> > +  if (flag_tree_bit_ccp
> > +  && !is_constant
> ^^^
> 
> ah, so if the bit-CCP part figured out a set of known bits
> then you don't do anything here while in fact you can
> still remove bits from mask with get_nonzero_bits info.

Yeah, that was not to lose info for the case where the lattice
is already CONSTANT.

Originally, I had code like:

--- tree-ssa-ccp.c.xx   2013-10-29 15:31:03.0 +0100
+++ tree-ssa-ccp.c  2013-10-29 15:34:44.257977817 +0100
@@ -1701,8 +1701,7 @@ evaluate_stmt (gimple stmt)
 }
 
   if (flag_tree_bit_ccp
-  && !is_constant
-  && likelyvalue != UNDEFINED
+  && (is_constant || likelyvalue != UNDEFINED)
   && gimple_get_lhs (stmt)
   && TREE_CODE (gimple_get_lhs (stmt)) == SSA_NAME)
 {
@@ -1711,11 +1710,27 @@ evaluate_stmt (gimple stmt)
   double_int mask = double_int::mask (TYPE_PRECISION (TREE_TYPE (lhs)));
   if (nonzero_bits != double_int_minus_one && nonzero_bits != mask)
{
- val.lattice_val = CONSTANT;
- val.value = build_zero_cst (TREE_TYPE (lhs));
- /* CCP wants the bits above precision set.  */
- val.mask = nonzero_bits | ~mask;
- is_constant = true;
+ if (!is_constant)
+   {
+ val.lattice_val = CONSTANT;
+ val.value = build_zero_cst (TREE_TYPE (lhs));
+ /* CCP wants the bits above precision set.  */
+ val.mask = nonzero_bits | ~mask;
+ is_constant = true;
+   }
+ else
+   {
+ double_int valv = tree_to_double_int (val.value);
+ gcc_assert ((valv & ~val.mask
+ & ~nonzero_bits & mask).is_zero ());
+ if (!(valv & ~nonzero_bits & mask).is_zero ())
+   val.value = double_int_to_tree (TREE_TYPE (lhs),
+   valv & nonzero_bits);
+ if (nonzero_bits.is_zero ())
+   val.mask = double_int_zero;
+ else
+   val.mask = val.mask & (nonzero_bits | ~mask);
+   }
}
 }
 
in the patch, but that breaks e.g. on
void
foo (unsigned int a, unsigned b, unsigned **c, void *d[8], int e)
{
  int i;
  for (i = 0; i != e; i++);
  if (a)
{
  for (i = 0;;)
{
  i--;
  if (bar ())
goto lab2;
}
lab1:;
  __builtin_printf ("%d\n", i < 0 ? -1 - i : i);
  return;
}
lab2:;
  if (!d[b] && *c)
goto lab1;
}
at -O2 - VRP1 figures out:
  if (i_3 < 0)
goto ;
  else
goto ;

  :
  # RANGE [0, 2147483647] NONZERO 0x07fff
  iftmp.0_23 = ~i_3;

  :
  # RANGE [0, 2147483647] NONZERO 0x07fff
  # iftmp.0_4 = PHI 
through the edge assertions, but CPP of course doesn't have edge
assertions and during iterations just at some point assumes
i_3 can be zero and processes bb 10 with that

Re: free is a killer

2013-10-29 Thread Jeff Law

On 10/29/13 09:15, Richard Biener wrote:

On Tue, Oct 29, 2013 at 4:01 PM, Jeff Law  wrote:

On 10/29/13 04:40, Richard Biener wrote:



Of course in the example we have the "global memory" storage class
(incoming function argument) and "malloc memory" which is really
the same storage class.  It only becomes a different storage class
if you factor in flow analysis (for which the current PTA code is weak
once you leave the SSA web).


The global memory storage class is really better thought of as "i haven't a
clue" class.


No, that's the "anything" class in the current PTA.


You want a different class for things passed in as you know they can't
conflict with anything in the current function's stack or anything malloc'd
by the current function.


That's the NONLOCAL class in the current PTA.  Which also means
that simple cases should work.  It's only when the malloc result
escapes somewhere that it gets reachable by NONLOCAL because
of the limited flow-sensitiveness of PTA (and its lack of context sensitivity).
OK.  I think we've got all the same concepts and may just be arguing 
about terminology :-)


jeff



Re: [PATCH] fixing typo in expr.c to allow proper recognition of complex addresses in some arches.

2013-10-29 Thread Jeff Law

On 10/29/13 06:10, Eric Botcazou wrote:

My reading of memory_address_addr_space is that MODE is the mode of the
memory reference, not the mode of the address.  I fail to see how
passing in the mode of the address in the first call can be correct.

What am I missing?


Nothing, it's me confusing

rtx memory_address_addr_space (enum machine_mode, rtx, addr_space_t);

with

rtx convert_memory_address_addr_space (enum machine_mode, rtx, addr_space_t);

for a long time apparently...  Sorry about that.  In the meantime, I installed
the obvious patch to eliminate the small redundancy.  Tested on x86_64/Linux.
No worries.  At one time I kept reading memory_address_addr_space_p and 
couldn't reconcile how the return value was used.  Eventually I realized 
I needed to be reading memory_address_space_addr_space instead.  Ugh.


Jeff



[patch] fix fallout from lto-streamer.h not including gimple.h

2013-10-29 Thread Andrew MacLeod
2 target files also included lto-streamer.h but did not include 
gimple.h.Fixed thusly.


Applied as revision 204166

Andrew



	* config/darwin.c: Include gimple.h.
	* config/i386/winnt.c: Likewise.

Index: config/darwin.c
===
*** config/darwin.c	(revision 204164)
--- config/darwin.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 45,50 
--- 45,51 
  #include "df.h"
  #include "debug.h"
  #include "obstack.h"
+ #include "gimple.h"
  #include "lto-streamer.h"
  
  /* Darwin supports a feature called fix-and-continue, which is used
Index: config/i386/winnt.c
===
*** config/i386/winnt.c	(revision 204164)
--- config/i386/winnt.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 35,40 
--- 35,41 
  #include "ggc.h"
  #include "target.h"
  #include "except.h"
+ #include "gimple.h"
  #include "lto-streamer.h"
  
  /* i386/PE specific attribute support.


Re: RFC/A: PR 58079: CONST_INT mode assert in combine.c:do_SUBST

2013-10-29 Thread Uros Bizjak
On Mon, Oct 28, 2013 at 5:28 PM, Uros Bizjak  wrote:

>> PR 58079 is about the do_SUBST assert:
>>
>>   /* Sanity check that we're replacing oldval with a CONST_INT
>> that is a valid sign-extension for the original mode.  */
>>   gcc_assert (INTVAL (newval)
>>  == trunc_int_for_mode (INTVAL (newval), GET_MODE (oldval)));
>>
>> triggering while trying to optimise the temporary result:
>>
>>   (eq (const_int -73 [0xffb7])
>>   (reg:QI 234 [ MEM[(const u8 *)buf_7(D) + 4B] ])
>>
>> combine_simplify_rtx calls simplify_comparison, which first canonicalises
>> the order so that the constant is second and then promotes the comparison
>> to SImode (the first supported comparison mode on MIPS).  So we end up with:
>>
>>   (eq (zero_extend:SI (reg:QI 234 [ MEM[(const u8 *)buf_7(D) + 4B] ]))
>>   (const_int 183 [0xb7]))
>>
>> So far so good.  But combine_simplify_rtx then tries to install the
>> new operands in-place, with the second part of:
>>
>>  /* If the code changed, return a whole new comparison.  */
>>  if (new_code != code)
>>return gen_rtx_fmt_ee (new_code, mode, op0, op1);
>>
>>  /* Otherwise, keep this operation, but maybe change its operands.
>> This also converts (ne (compare FOO BAR) 0) to (ne FOO BAR).  */
>>  SUBST (XEXP (x, 0), op0);
>>  SUBST (XEXP (x, 1), op1);
>>
>> And this triggers the assert because we're replacing a QImode register
>> with the non-QImode constant 183.
>>
>> One fix would be to extend the new_code != code condition, as below.
>> This should avoid the problem cases without generating too much
>> garbage rtl.  But if the condition's getting this complicated,
>> perhaps it'd be better just to avoid SUBST here?  I.e.
>>
>>  if (new_code != code || op0 != XEXP (x, 0) || op1 != XEXP (x, 1))
>>return gen_rtx_fmt_ee (new_code, mode, op0, op1);
>>
>> Just asking though. :-)
>>
>> Tested on mipsisa64-elf.  OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> gcc/
>> PR rtl-optimization/58079
>> * combine.c (combine_simplify_rtx): Avoid using SUBST if
>> simplify_comparison has widened a comparison with an integer.
>>
>> gcc/testsuite/
>> * gcc.dg/torture/pr58079.c: New test.
>
> I would like to backport this patch to 4.8 branch. The patch fixes the
> failure on alpha-linux-gnu and also (reportedly) fixes the same
> failure on mips and pa targets.
>
> I have bootstrapped and regression test the patch on
> x86_64-pc-linux-gnu {,-m32} and started bootstrap and regtests on
> alphaev68-linux-gnu on 4.8 branch for all default languages + obj-c++
> and go. The latter will take some ~10 hours.
>
> OK to backort the patch to the 4.8 branch if both tests pass?

The regression test was OK in both cases, so I went ahead and
installed the patch. The fix is kind of obvious anyway.

Uros.


Re: free is a killer

2013-10-29 Thread Jeff Law

On 10/29/13 00:38, Marc Glisse wrote:


Indeed. Do you want to commit it xfailed or put it in bugzilla so we
don't lose it? (it becomes harder if you replace p with p-1 in the
memset arguments).
I'll either add it as xfailed, or I'll add it as-is if I fix the alias 
code.   It'd primarly be to address the current incorrect state of the 
code as it stands today rather than to extensively feature test this code.





Sounds good, but I am not really up to rewriting the points-to analysis
at the moment...
Understandable.  Never hurts to put a bug in someone's ear, you never 
know when they (or someone else reading) will run with it.


jeff



Re: [PATCH] Use get_nonzero_bits to improve vectorization

2013-10-29 Thread Jakub Jelinek
On Tue, Oct 29, 2013 at 01:11:53PM +0100, Richard Biener wrote:
> > +/* Return number of known trailing zero bits in EXPR, or, if the value of
> > +   EXPR is known to be zero, the precision of it's type.  */
> > +
> > +int
> 
> unsigned int?

Ok.

> > +case PLUS_EXPR:
> > +case MINUS_EXPR:
> > +case BIT_IOR_EXPR:
> > +case BIT_XOR_EXPR:
> > +case MIN_EXPR:
> > +case MAX_EXPR:
> > +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> 
> This first recurses but if either one returns 0 you don't have
> to (that cuts down the recursion from exponential to linear in
> the common case?).  Thus, early out on ret == 0?

Yeah, that is reasonable.  Usually we use this during expansion etc.,
trees won't be arbitrarily large and for SSA_NAMEs we don't recurse
on definitions, so I think we are unlikely to ever see very large
expressions there though.  I've added similar early out to the COND_EXPR
case.

> > +  return MIN (ret1, ret2);
> > +case POINTER_PLUS_EXPR:
> > +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > +  ret2 = MIN (ret2, prec);
> 
> Why do you need that here but not elsewhere when processing
> binary ops?

Because POINTER_PLUS_EXPR (well, also shifts, but for those we
don't call tree_ctz on the second argument) is the only
binop we handle that uses different types for the operands,
for the first argument we know it has the same precision as the result, but
what if sizetype has bigger precision than TYPE_PRECISION of pointers?
I know it typically doesn't, just wanted to make sure we never return
an out of range return value, tree_ctz result should be in [0, prec]
always.

> > +  return MIN (ret1, ret2);
> > +case BIT_AND_EXPR:
> > +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > +  return MAX (ret1, ret2);
> > +case MULT_EXPR:
> > +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > +  return MIN (ret1 + ret2, prec);
> > +case LSHIFT_EXPR:
> > +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +  if (host_integerp (TREE_OPERAND (expr, 1), 1)
> 
> check that first before recursing for op0 - if op1 is negative
> you simply return ret1 which looks wrong, too.

If op1 is negative, then it is undefined behavior, so assuming
in that case the same thing as when op1 is not constant (i.e.
that we worst case shift left by 0 and thus not increase number
of trailing zeros, or shift left by > 0 and increase it) is IMHO
correct.  I don't think we treat left shift by negative value as
right shift anywhere, do we?

> > + && ((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1)
> > + < (unsigned HOST_WIDE_INT) prec))
> 
> This check is to avoid overflowing ret1 + ret2?  If so, why not add
> 
> > +   {
> > + ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1);
> 
>   ret2 = MIN (ret2, prec);
> 
> instead?

Because ret{1,2} are int (well, with the change you've suggested unsigned
int), but tree_low_cst is UHWI, so if the shift count is say UINT_MAX + 1
(yeah, surely undefined behavior), I just didn't want to lose any of the upper
bits.  Sure, alternatively I could have an UHWI temporary variable and
assign tree_low_cst to it and do the MIN on that temporary and prec, but
still I think it is better to handle out of range constants as variable
shift count rather than say shift count up by prec - 1.

> > +case TRUNC_DIV_EXPR:
> > +case CEIL_DIV_EXPR:
> > +case FLOOR_DIV_EXPR:
> > +case ROUND_DIV_EXPR:
> > +case EXACT_DIV_EXPR:
> > +  if (TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST)
> > +   {
> > + ret2 = tree_log2 (TREE_OPERAND (expr, 1));
> > + if (ret2 >= 0 && tree_int_cst_sgn (TREE_OPERAND (expr, 1)) == 1)
> 
> cheaper to test the sign first.

Ok.
> 
> > +   {
> > + ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > + if (ret1 > ret2)
> > +   return ret1 - ret2;
> > +   }
> > +   }
> > +  return 0;
> > +CASE_CONVERT:
> > +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > +  if (ret1 && ret1 == TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (expr, 
> > 0
> > +   ret1 = prec;
> > +  return MIN (ret1, prec);
> > +case SAVE_EXPR:
> > +  return tree_ctz (TREE_OPERAND (expr, 0));
> > +case COND_EXPR:
> > +  ret1 = tree_ctz (TREE_OPERAND (expr, 1));
> > +  ret2 = tree_ctz (TREE_OPERAND (expr, 2));
> > +  return MIN (ret1, ret2);
> > +case COMPOUND_EXPR:
> > +  return tree_ctz (TREE_OPERAND (expr, 1));
> > +case ADDR_EXPR:
> > +  ret1 = get_object_alignment (TREE_OPERAND (expr, 0));
> 
> Use get_pointer_alignment, this isn't a memory reference so type
> alignment rules don't apply.

Ok.

Jakub


Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-29 Thread Vladimir Makarov
On 10/25/2013 11:19 AM, Martin Jambor wrote:
> Hi,
>
> On Thu, Oct 24, 2013 at 01:02:51AM +0200, Steven Bosscher wrote:
>> On Wed, Oct 23, 2013 at 6:46 PM, Martin Jambor wrote:
>>
>>>  /* Perform the second half of the transformation started in
>>> @@ -4522,7 +4704,15 @@ ira (FILE *f)
>>>   allocation because of -O0 usage or because the function is too
>>>   big.  */
>>>if (ira_conflicts_p)
>>> -find_moveable_pseudos ();
>>> +{
>>> +  df_analyze ();
>>> +  calculate_dominance_info (CDI_DOMINATORS);
>>> +
>>> +  find_moveable_pseudos ();
>>> +  split_live_ranges_for_shrink_wrap ();
>>> +
>>> +  free_dominance_info (CDI_DOMINATORS);
>>> +}
>>>
>> You probably want to add another df_analyze if
>> split_live_ranges_for_shrink_wrap makes code transformations. AFAIU
>> find_moveable_pseudos doesn't change global liveness but your
>> transformation might. IRA/LRA need up-to-date DF_LR results to compute
>> allocno live ranges.
>>
> OK, I have changed the patch to fo that (it is below, still bootstraps
> and passes tests on x86_64 fine).  However, I have noticed that the
> corresponding part in function ira now looks like:
>
>   /* ... */
>   if (delete_trivially_dead_insns (get_insns (), max_reg_num ()))
> df_analyze ();
>
>   /* It is not worth to do such improvement when we use a simple
>  allocation because of -O0 usage or because the function is too
>  big.  */
>   if (ira_conflicts_p)
> {
>   df_analyze ();
>   calculate_dominance_info (CDI_DOMINATORS);
>
>   find_moveable_pseudos ();
>   if (split_live_ranges_for_shrink_wrap ())
>   df_analyze ();
>
>   free_dominance_info (CDI_DOMINATORS);
> }
>   /* ... */
>
> So, that left me wondering whether the first call to df_analyze is
> actually necessary, or whether perhaps the data are actually already
> up to date.  What do you think?
I guess it needs some investigation.  delete_trivially_dead_insns code
was taken from the old RA.  First of all, I don't know how many insns
are really trivially dead before RA in optimization and non-optimization
mode.  May be the code can be removed at all.  I'll put it on my todo list.
The patch is ok to commit.  Thanks for working on this, Martin.
>
> 2013-10-23  Martin Jambor  
>
>   PR rtl-optimization/10474
>   * ira.c (find_moveable_pseudos): Do not calculate dominance info
>   nor df analysis.
>   (interesting_dest_for_shprep): New function.
>   (split_live_ranges_for_shrink_wrap): Likewise.
>   (ira): Calculate dominance info and df analysis. Call
>   split_live_ranges_for_shrink_wrap.
>
> testsuite/
>   * gcc.dg/pr10474.c: New testcase.
>   * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
>   * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
>
>



Re: [Fwd: Re: [PATCH (updated)] Convert symtab, cgraph and varpool nodes into a real class hierarchy]

2013-10-29 Thread Jeff Law

On 10/29/13 10:03, David Malcolm wrote:


commit 739276306f1d8f5d0b1202da21cc29b5fcac102e
Author: David Malcolm
Date:   Mon Oct 28 22:25:38 2013 -0400

 Add chain_next and chain_prev options back to symtab_node_base

 gcc/
* cgraph.h (symtab_node_base): Add missing chain_next and
chain_prev GTY options.

This is fine.  Sorry I didn't review it at the same time as the other patch.

Jeff


Re: RE : Problem with _Hashtable_ebo_helper

2013-10-29 Thread Jonathan Wakely
This patch replaces the problematic use of _Hashtable_ebo_helper with
a new type.

2013-10-29  Jonathan Wakely  

* include/bits/hashtable.cc (__access_protected_ctor): Define and use
new type instead of _Hashtable_ebo_helper.
* testsuite/23_containers/unordered_set/
not_default_constructible_hash_neg.cc: Adjust line number.

Tested x86_64-linux, I'll commit it to trunk today.
commit 50786d1785911ef2d335fdcd162e71e14da4
Author: Jonathan Wakely 
Date:   Tue Oct 29 15:50:00 2013 +

* include/bits/hashtable.cc (__access_protected_ctor): Define and use
new type instead of _Hashtable_ebo_helper.
* testsuite/23_containers/unordered_set/
not_default_constructible_hash_neg.cc: Adjust line number.

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index c639c55..aae146b 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -277,14 +277,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"Functor used to map hash code to bucket index"
" must be default constructible");
 
+  // _Hash_code_base has a protected default constructor, so use this
+  // derived type to tell if it's usable.
+  struct __access_protected_ctor : __hash_code_base { };
+
   // When hash codes are not cached local iterator inherits from
   // __hash_code_base above to compute node bucket index so it has to be
   // default constructible.
   static_assert(__if_hash_not_cached<
-   is_default_constructible<
- // We use _Hashtable_ebo_helper to access the protected
- // default constructor.
- __detail::_Hashtable_ebo_helper<0, __hash_code_base, 
true>>>::value,
+   is_default_constructible<__access_protected_ctor>>::value,
"Cache the hash code or make functors involved in hash code"
" and bucket index computation default constructible");
 
diff --git 
a/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc
 
b/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc
index 3332cc5..4216b91 100644
--- 
a/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc
+++ 
b/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc
@@ -19,7 +19,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-error "default constructible" "" { target *-*-* } 283 }
+// { dg-error "default constructible" "" { target *-*-* } 287 }
 
 #include 
 


Re: patch to improve register preferencing in IRA and to *remove regmove* pass

2013-10-29 Thread Jeff Law

On 10/29/13 09:12, Vladimir Makarov wrote:

   Tomorrow I'd like commit the following patch.

   The patch removes regmove pass.

I'm buying next time we get together :-)  You have my eternal gratitude.




   I've found only one useful transformations in regmove pass:

   dst = srcdst = src (src dies)
   ...  no dst or src modification  =>  src changed on dst
   src dies ...

   It is some kind value numbering technique decreasing register
   pressure by removing one live-range.  It is not frequently
   triggered transformation (about 30 in all combine.c) and its
   effect is quite small but there is no harm in it at all too.

   So I added the code to IRA without changes practically (it would
   be nice to make this in more general way, e.g. not only in BB
   scope -- but I am not sure that it will makes visible
   improvements and I have no time to try this currently).
I wouldn't be concerned about generalizing this too much.  I can't see 
how it's *that* important, and we can always come back to it if we 
decide it is important.





   Still to achieve the same code performance without regmove pass, I
needed to improve code in IRA which implicitly replace removed regmove
transformations:

   o improving hard reg preferences.  As you know RTL code can contain
 explicitly hard regs.  Currently, hard register in RTL code affect
 costs of hard regs only for pseudo involved with the hard register
 occurrence.  E.g.

[ ... ]
This sounds good and generally useful.



   o improving preference propagation of hard registers occurring in RTL
 and assigned to connected pseudo.  Let us look at the situation:

p1 - p2 - p3,  where '-' means a copy

 and we are assigning p1, p2, p3 in the same order.
 When we've just assigned hr1 to p1, we propagating hr1 preference
 to p2 and p3.  When we assign to p2, we can not assign hr1 for
 some reason and have to assign hr2.  P3 in the current preference
 implementation still has hr1 preference which is wrong.

 I implemented undoing preference propagation for such situations.

Ouch.  Again, sounds good to get this fixed.



   o Currently IRA generates too aggressively copies for operands might
   be matched, so I rewrite this code to generate copies more
   accurately.

Also good :-)



   The changes in testsuites are necessary as IRA/LRA now generate a
different code (more accurately a better code by removing register
shuffle moves for each case).

Excellent.


   So this patch removes a lot of code, decrease compilation time
(e.g. valgrind lackey reports about 0.4% less executed insns on
compiling GCC combine.i with -O2), generates about the same performace
code (the best improvement I saw is 0.5% SPEC2000
improvement on x86_64 in -O3 mode on a Haswell processor) and about
the same average code size for SPEC2000 (the differences in hundredth
percent range).

No concerns here.




   It is a big change and I hope there are no serious objections to
this.  If somebody has them, please express them or inform me.

   Thanks, Vlad.

2013-10-28  Vladimir Makarov  

* regmove.c: Remove.
* tree-pass.h (make_pass_regmove): Remove.
* timevar.def (TV_REGMOVE): Remove.
* passes.def (pass_regmove): Remove.
* opts.c (default_options_table): Remove entry for regmove.
* doc/passes.texi: Remove regmove pass description.
* doc/invoke.texi (-foptimize-register-move, -fregmove): Remove
options.
(-fdump-rtl-regmove): Ditto.
* common.opt (foptimize-register-move, fregmove): Remove.
* Makefile.in (OBJS): Remove regmove.o.
* regmove.c: Remove.
* ira-int.h (struct ira_allocno_pref, ira_pref_t): New structure
and type.
(struct ira_allocno) New member allocno_prefs.
(ALLOCNO_PREFS): New macro.
(ira_prefs, ira_prefs_num): New external vars.
(ira_setup_alts, ira_get_dup_out_num, ira_debug_pref): New
prototypes.
(ira_debug_prefs, ira_debug_allocno_prefs, ira_create_pref):
Ditto.
(ira_add_allocno_pref, ira_remove_pref, ira_remove_allocno_prefs):
Ditto.
(ira_add_allocno_copy_to_list): Remove prototype.
(ira_swap_allocno_copy_ends_if_necessary): Ditto.
(ira_pref_iterator): New type.
(ira_pref_iter_init, ira_pref_iter_cond): New functions.
(FOR_EACH_PREF): New macro.
* ira.c (commutative_constraint_p): Move from ira-conflicts.c.
(ira_get_dup_out_num): Ditto. Rename from get_dup_num.  Modify the
code.
(ira_setup_alts): New function.
(decrease_live_ranges_number): New function.
(ira): Call the above function.
* ira-build.c (ira_prefs, ira_prefs_num): New global vars.
(ira_create_allocno): Initialize allocno prefs.
(pref_pool, pref_vec): New stati

Re: RFA: Andes nds32 port v4 patch

2013-10-29 Thread Joseph S. Myers
On Tue, 29 Oct 2013, Chung-Ju Wu wrote:

> Thank you very much for the comments on libgcc and documentation parts.
> Here I re-list the v4 patch of nds32 port:
> 
>   Machine Description --
> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02153.html
> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02154.html
> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02156.html
>   Libgcc and Documentation --
> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02382.html
> http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02384.html
> 
> Are those nds32 port patches OK to be committed?

This is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [patch] Mostly remove tree-core.h from other .h files.

2013-10-29 Thread Andrew MacLeod

On 10/28/2013 03:38 PM, Andrew MacLeod wrote:
I noticed there were a few .h files which we including tree-core.h 
themselves.   expr.h was one of them, and many of the rtl files used 
that to get at  the basic tree structure for various bits.


I moved the include of tree-core.h to rtl.h instead.  That way all the 
rtl files get it, and when we flatten the rtl.h includes into the c 
files, it'll be reduced to just the ones that need it. Eventually, 
those things that need it from rtl.h won't even need tree-code.h, 
they'll get converted to the new gimple interface instead.


bah, looks like I broke some cross compiling.  Apparently some gen* 
routines include rtl.h before all-tree.def exists, and this causes 
breakage.  Doesnt happen during stage 1 of a cross, but later.

sigh.

I'll put tree-core.h back into expr.h, remove it from rtl.h and look to 
flatten it properly shortly...


bootstrapped on x86_64-unknown-linux-gnu and applied as revision 204167.

Andrew


	* expr.h: Revert change and include tree-core.h.
	* rtl.h: Revert change and don't include tree-core.h.

Index: expr.h
===
*** expr.h	(revision 204163)
--- expr.h	(working copy)
*** along with GCC; see the file COPYING3.  
*** 26,31 
--- 26,34 
  #include "rtl.h"
  /* For optimize_size */
  #include "flags.h"
+ /* For host_integerp, tree_low_cst, fold_convert, size_binop, ssize_int,
+TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/
+ #include "tree-core.h"
  /* For GET_MODE_BITSIZE, word_mode */
  #include "machmode.h"
  
Index: rtl.h
===
*** rtl.h	(revision 204163)
--- rtl.h	(working copy)
*** along with GCC; see the file COPYING3.  
*** 29,35 
  #include "alias.h"
  #include "hashtab.h"
  #include "flags.h"
- #include "tree-core.h"
  
  /* Value used by some passes to "recognize" noop moves as valid
   instructions.  */
--- 29,34 


Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.

2013-10-29 Thread Richard Henderson
On 10/29/2013 03:02 AM, Kirill Yukhin wrote:
> Hello Richard,
> 
> On 28 Oct 14:45, Richard Henderson wrote:
>> On 10/28/2013 01:58 PM, Kirill Yukhin wrote:
>>> Hello Richard,
>>> On 28 Oct 08:20, Richard Henderson wrote:
 Why is a masked *scalar* operation useful?
>>>
>>> The reason the instructions exist is so that
>>> you can do fully fault correct predicated scalar algorithms.
>>
>> Using VEC_MERGE isn't the proper representation for that.
>>
>> If that's your real goal, then COND_EXEC is the only way to let
>> rtl know that faults are suppressed in the false condition.
> 
> I believe cond_exec approach supposed to look like this:
>   (define_subst "mask_scalar"
> [(set (match_operand:SUBST_V 0)
>   (vec_merge:SUBST_V
> (match_operand:SUBST_V 1)
> (match_operand:SUBST_V 2)
> (const_int 1)))]
> "TARGET_AVX512F"
> [(cond_exec (eq:CC
>   (match_operand: 3 "register_operand" "k")
>   (const_int 1))
> (set (match_dup 0)
>  (vec_merge:SUBST_V
>(match_dup 1)
>(match_dup 2)
>(const_int 1])
> 
> But this only will describe merge-masking in incorrect way.
> We will need to add a clobber to signal that even for false
> condition we will zero higher part of register.
> Preferable zerro-masking will be indistinguishable from merge-
> masking and will need to choose which mask mode to enable. Bad turn.

No, a cond_exec approach to scalars would use scalar modes
not vector modes with vec_merge.  In that case the higher
part of the register is ignored and undefined, and the fact
that zeroing happens is irrelevant.

> 
> IMHO, we have 3 options to implement scalar masked insns:
>   1. `vec_merge' over vec_merge (current approach).
>  Pro.
>1. Precise semantic description

False, as I've described above.  The compiler will believe
that all exceptions, especially memory exceptions, will be
taken.

>2. Unified approach with vector patterns
>3. Freedom for simplifier to reduce EVEX to VEX for
>certain const masks
>  Cons.
>1. Too precise semantic description and as a
>consequence complicated code in md-file
> 
>   2. `cond_exec' approach
> Pro.
>   1. Look useful for compiler when trying to generate
>   predicated code
> Cons.
>   1. Not precise. Extra clobbers (?) needed: to signal
>   that we're changing the register even for false
>   condition in cond_exec 
>   2. Unable to describe zero masking nicely
>   3. Code still complicated as for option #1
>   4. Simplifier won't work (clobber is always clobber)
> 
>   3. Make all masked scalar insns to be unspecs
> Pro.
>   1. Straight-forward, not overweighted. Enough for
>   intrinsics to work
> Cons.
>   1. Since every unspec needs a code: substs won't be
>   applied directly: huge volume of similar code
>   2. Simplifier won't work
>   3. Generation of predicated code become hard
> 
> Am I missing  some options, or that’s all we have?
> If so, what option would you prefer?

As far as builtins are concerned, all three approaches are
functional.  But in order for the compiler to be able to
automatically create conditional code from normal scalar
code we'll have to use cond_exec.

Indeed, if we've done our job right, then the user-facing
inlines that we expose could be written

  inline double add_mask(double r, double x, double y, int m)
  {
if (m & 1)
  r = x + y;
return r;
  }

although honestly such inlines would be silly.


Re: patch to improve register preferencing in IRA and to *remove regmove* pass

2013-10-29 Thread Alexander Monakov
Hello,

A very minor nit: in common.opt, entries for the options should be changed to

fregmove
Common Ignore
Does nothing. Preserved for backward compatibility.

instead of removing them altogether, so the compiler does not start rejecting
build commands with such options.  There are now a few such entries there.

Alexander


Re: patch to improve register preferencing in IRA and to *remove regmove* pass

2013-10-29 Thread Joseph S. Myers
Please change the .opt entries to

foptimize-register-move
Common Ignore
Does nothing. Preserved for backward compatibility.

and

fregmove
Common Ignore
Does nothing. Preserved for backward compatibility.

rather than removing them completely.  In general, when removing an option 
that's purely about optimizations rather than affecting the semantics of 
the user's source code, it's best to leave it behind as an ignored option 
like that.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: patch to improve register preferencing in IRA and to *remove regmove* pass

2013-10-29 Thread Vladimir Makarov
On 10/29/2013 12:26 PM, Jeff Law wrote:
> On 10/29/13 09:12, Vladimir Makarov wrote:
>>I've found only one useful transformations in regmove pass:
>>
>>dst = srcdst = src (src dies)
>>...  no dst or src modification  =>  src changed on dst
>>src dies ...
>>
>>It is some kind value numbering technique decreasing register
>>pressure by removing one live-range.  It is not frequently
>>triggered transformation (about 30 in all combine.c) and its
>>effect is quite small but there is no harm in it at all too.
>>
>>So I added the code to IRA without changes practically (it would
>>be nice to make this in more general way, e.g. not only in BB
>>scope -- but I am not sure that it will makes visible
>>improvements and I have no time to try this currently).
> I wouldn't be concerned about generalizing this too much.  I can't see
> how it's *that* important, and we can always come back to it if we
> decide it is important.
>
I am completely agree with you.  I don't expect any important outcome
from this work.  In fact, I did some experiments with GVN about 10 years
ago (how time flies) for old RA and did not see any visible effect on
performance but generated code always changes and may be there are more
opportunity for this (although more probably less opportunities).  In
any case, it is lowest priority project for me and most probably I will
never start it.
 


Re: patch to improve register preferencing in IRA and to *remove regmove* pass

2013-10-29 Thread Vladimir Makarov
On 10/29/2013 12:43 PM, Alexander Monakov wrote:
> Hello,
>
> A very minor nit: in common.opt, entries for the options should be changed to
>
> fregmove
> Common Ignore
> Does nothing. Preserved for backward compatibility.
>
> instead of removing them altogether, so the compiler does not start rejecting
> build commands with such options.  There are now a few such entries there.
>
Thanks, Alex.  A good point.  I'll change it.



Re: patch to improve register preferencing in IRA and to *remove regmove* pass

2013-10-29 Thread Vladimir Makarov
On 10/29/2013 12:44 PM, Joseph S. Myers wrote:
> Please change the .opt entries to
>
> foptimize-register-move
> Common Ignore
> Does nothing. Preserved for backward compatibility.
>
> and
>
> fregmove
> Common Ignore
> Does nothing. Preserved for backward compatibility.
>
> rather than removing them completely.  In general, when removing an option 
> that's purely about optimizations rather than affecting the semantics of 
> the user's source code, it's best to leave it behind as an ignored option 
> like that.
>
Thanks, Joseph.  I modified common.opt accordingly.



Re: [GOOGLE] Don't update the callee count if caller is not resolved node

2013-10-29 Thread Dehao Chen
On Mon, Oct 28, 2013 at 9:49 PM, Xinliang David Li  wrote:
> Is it sufficient to check if the final caller is defined in primary module?

This might not be sufficient because the final caller may come from
comdat of aux-modules (not defined in the primary module).

>
> Note that in some cases, doing profile update 'speculatively' (without
> your fix) can be more precise (assuming the inlining gets materialized
> in a different compilation), but in general it might be better to be
> conservative in profile update (as in your patch) -- the downside is
> you may end up with less aggressive size optimization.

This is actually no speculatively update profile, but double update.

E.g. comdat_foo()-->bar() with count 100

and comdat_foo() has been defined in 2 aux-modules. Then there will be
two edges from comdat_foo()-->bar(), each of which has the count 100.
But bar()'s entry count is only 100 (assume comdat_foo is the only
caller). Then if we update bar() twice when inline these two edges,
the second update will be wrong.

Dehao

>
> David
>
> On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen  wrote:
>> This patch changes to no update callee count if caller count is not a
>> resolved node (in LIPO mode) during AutoFDO compilation. This is
>> because AutoFDO will have the same edge counts for all unresolved
>> nodes. Original update method will lead to multi-update of the callee.
>>
>> Bootstrapped and testing on going.
>>
>> OK for google-4_8 if test is OK?
>>
>> Thanks,
>> Dehao
>>
>> Index: gcc/ipa-inline-transform.c
>> ===
>> --- gcc/ipa-inline-transform.c (revision 204131)
>> +++ gcc/ipa-inline-transform.c (working copy)
>> @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>else
>>   {
>>struct cgraph_node *n;
>> +  if (flag_auto_profile && L_IPO_COMP_MODE
>> +  && cgraph_pre_profiling_inlining_done)
>> +{
>> +  struct cgraph_node *caller = e->caller;
>> +  if (caller->global.inlined_to)
>> + caller = caller->global.inlined_to;
>> +  if (cgraph_lipo_get_resolved_node (caller->symbol.decl) != caller)
>> + update_original = false;
>> +}
>>n = cgraph_clone_node (e->callee, e->callee->symbol.decl,
>>   e->count, e->frequency,
>>   update_original, vNULL, true);


Re: [PATCH] Introduce [sg]et_nonzero_bits

2013-10-29 Thread Jakub Jelinek
Hi!

On Tue, Oct 29, 2013 at 04:29:56PM +0100, Jakub Jelinek wrote:
> > Surely you can't rely on CCP and VRP compute exactly the same
> > nonzero_bits.  As you don't record/compute zero_bits you can't
> > tell whether a not set bit in nonzer_bits is "don't know" or
> > if it is "zero".  And you cannot do an assert during iteration

Unset bits in get_nonzero_bits are always known to be zero, set bits
are don't know or known to be nonzero.

> > (during iteration optimistically 'wrong' values can disappear).
> > 
> > During CCP iteration the rule is that bits may only be added to mask
> > (and obviously the constant itself on a CONSTANT lattice value may
> > not change).

Here is an untested safer variant of the original patch that attempts
to honor the constraints you've mentioned by looking at get_value
and either making nonzero_bits more conservative or not applying it at
all if the above rules would be violated.  Still, not doing anything
for the !is_constant case, because I don't know what would be the right
thing.  Say, if for iteration we assume some constant value that has some
bits unset in mask (known) and set in value (known non-zero), yet they
are clear in get_nonzero_bits, shall we punt, or ignore what the iteration
would try to do and adjust from get_nonzero_bits instead, something else?

> 
> > Factor this out to commonize with code in evaluate_stmt
> 
> Ok.

But in this more conservative patch I don't see enough code to commonize
any more.

The only change in the patch below since the previous patch is in the
evaluate_stmt function:

2013-10-29  Jakub Jelinek  

* gimple-pretty-print.c (dump_ssaname_info): Print newline also
in case of VR_VARYING.  Print get_nonzero_bits if not all ones.
* tree-ssanames.h (struct range_info_def): Add nonzero_bits field.
(set_nonzero_bits, get_nonzero_bits): New prototypes.
* tree-ssa-ccp.c (get_default_value): Use get_range_info to see if
a default def isn't partially constant.
(ccp_finalize): If after IPA, set_range_info if integral SSA_NAME
is known to be partially zero.
(evaluate_stmt): If we'd return otherwise VARYING, use get_range_info
to see if a default def isn't partially constant.
* tree-ssanames.c (set_range_info): Initialize nonzero_bits upon
creation of a range, if VR_RANGE, try to improve nonzero_bits from
the range.
(set_nonzero_bits, get_nonzero_bits): New functions.

* g++.dg/warn/pr33738.C (main): Initialize a2 again to make sure
we warn about it already during VRP1 pass.

--- gcc/gimple-pretty-print.c.jj2013-10-23 14:43:12.0 +0200
+++ gcc/gimple-pretty-print.c   2013-10-24 17:26:59.650945232 +0200
@@ -1731,7 +1731,7 @@ dump_ssaname_info (pretty_printer *buffe
   if (!POINTER_TYPE_P (TREE_TYPE (node))
   && SSA_NAME_RANGE_INFO (node))
 {
-  double_int min, max;
+  double_int min, max, nonzero_bits;
   value_range_type range_type = get_range_info (node, &min, &max);
 
   if (range_type == VR_VARYING)
@@ -1744,8 +1744,20 @@ dump_ssaname_info (pretty_printer *buffe
  pp_printf (buffer, ", ");
  pp_double_int (buffer, max, TYPE_UNSIGNED (TREE_TYPE (node)));
  pp_printf (buffer, "]");
- newline_and_indent (buffer, spc);
}
+  nonzero_bits = get_nonzero_bits (node);
+  if (nonzero_bits != double_int_minus_one
+ && (nonzero_bits
+ != double_int::mask (TYPE_PRECISION (TREE_TYPE (node)
+   {
+ pp_string (buffer, " NONZERO ");
+ sprintf (pp_buffer (buffer)->digit_buffer,
+  HOST_WIDE_INT_PRINT_DOUBLE_HEX,
+  (unsigned HOST_WIDE_INT) nonzero_bits.high,
+  nonzero_bits.low);
+ pp_string (buffer, pp_buffer (buffer)->digit_buffer);
+   }
+  newline_and_indent (buffer, spc);
 }
 }
 
--- gcc/tree-ssanames.h.jj  2013-09-27 15:42:44.0 +0200
+++ gcc/tree-ssanames.h 2013-10-24 15:52:53.765955605 +0200
@@ -52,6 +52,8 @@ struct GTY (()) range_info_def {
   double_int min;
   /* Maximum for value range.  */
   double_int max;
+  /* Non-zero bits - bits not set are guaranteed to be always zero.  */
+  double_int nonzero_bits;
 };
 
 
@@ -68,10 +70,11 @@ struct GTY (()) range_info_def {
 enum value_range_type { VR_UNDEFINED, VR_RANGE, VR_ANTI_RANGE, VR_VARYING };
 
 /* Sets the value range to SSA.  */
-extern void set_range_info (tree ssa, double_int min, double_int max);
+extern void set_range_info (tree, double_int, double_int);
 /* Gets the value range from SSA.  */
-extern enum value_range_type  get_range_info (tree name, double_int *min,
- double_int *max);
+extern enum value_range_type get_range_info (tree, double_int *, double_int *);
+extern void set_nonzero_bits (tree, double_int);
+extern double_int get_nonzero_bits (tree);
 extern void init_ssanames (struct function *, int);
 exte

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-29 Thread Cong Hou
On Tue, Oct 29, 2013 at 1:38 AM, Uros Bizjak  wrote:
> Hello!
>
>> For the define_expand I added as below, the else body is there to
>> avoid fall-through transformations to ABS operation in optabs.c.
>> Otherwise ABS will be converted to other operations even that we have
>> corresponding instructions from SSSE3.
>
> No, it wont be.
>
> Fallthrough will generate the pattern that will be matched by the insn
> pattern above, just like you are doing by hand below.


I think the case is special for abs(). In optabs.c, there is a
function expand_abs() in which the function expand_abs_nojump() is
called. This function first tries the expand function defined for the
target and if it fails it will try max(v, -v) then shift-xor-sub
method. If I don't generate any instruction for SSSE3, the
fall-through will be max(v, -v). I have tested it on my machine.


>
>> (define_expand "abs2"
>>   [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand")
>> (abs:VI124_AVX2_48_AVX512F
>>  (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand")))]
>>   "TARGET_SSE2"
>> {
>>   if (!TARGET_SSSE3)
>> ix86_expand_sse2_abs (operands[0], force_reg (mode, operands[1]));
>
> Do you really need force_reg here? You are using generic expanders in
> ix86_expand_sse2_abs that can handle non-registers operands just as
> well.

You are right. I have removed force_reg.


>
>>   else
>> emit_insn (gen_rtx_SET (VOIDmode, operands[0],
>>gen_rtx_ABS (mode, operands[1])));
>>   DONE;
>> })
>
> Please note that your mailer mangles indents. Please indent your code 
> correctly.

Right.. I also attached a text file in which all tabs are there.


The updated patch is pasted below (and also in the attached file).
Thank you very much for your comment!


Cong




diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8a38316..84c7ab5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2013-10-22  Cong Hou  
+
+ PR target/58762
+ * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function.
+ * config/i386/i386.c (ix86_expand_sse2_abs): New function.
+ * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int).
+
 2013-10-14  David Malcolm  

  * dumpfile.h (gcc::dump_manager): New class, to hold state
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 3ab2f3a..ca31224 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx,
rtx, rtx, bool, bool);
 extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
+extern void ix86_expand_sse2_abs (rtx, rtx);

 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 02cbbbd..71905fc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
gen_rtx_MULT (mode, op1, op2));
 }

+void
+ix86_expand_sse2_abs (rtx op0, rtx op1)
+{
+  enum machine_mode mode = GET_MODE (op0);
+  rtx tmp0, tmp1;
+
+  switch (mode)
+{
+  /* For 32-bit signed integer X, the best way to calculate the absolute
+ value of X is (((signed) X >> (W-1)) ^ X) - ((signed) X >> (W-1)).  */
+  case V4SImode:
+ tmp0 = expand_simple_binop (mode, ASHIFTRT, op1,
+GEN_INT (GET_MODE_BITSIZE
+ (GET_MODE_INNER (mode)) - 1),
+NULL, 0, OPTAB_DIRECT);
+ if (tmp0)
+  tmp1 = expand_simple_binop (mode, XOR, op1, tmp0,
+  NULL, 0, OPTAB_DIRECT);
+ if (tmp0 && tmp1)
+  expand_simple_binop (mode, MINUS, tmp1, tmp0,
+   op0, 0, OPTAB_DIRECT);
+ break;
+
+  /* For 16-bit signed integer X, the best way to calculate the absolute
+ value of X is max (X, -X), as SSE2 provides the PMAXSW insn.  */
+  case V8HImode:
+ tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+ if (tmp0)
+  expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0,
+   OPTAB_DIRECT);
+ break;
+
+  /* For 8-bit signed integer X, the best way to calculate the absolute
+ value of X is min ((unsigned char) X, (unsigned char) (-X)),
+ as SSE2 provides the PMINUB insn.  */
+  case V16QImode:
+ tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+ if (tmp0)
+  expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0,
+   OPTAB_DIRECT);
+ break;
+
+  default:
+ break;
+}
+}
+
 /* Expand an insert into a vector register through pinsr insn.
Return true if successful.  */

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c3f6c94..0d9cefe 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8721,7 +8721,7 @@
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
(set_attr "mode" "DI")])

-(define_insn "abs2"
+(define_insn "*abs2"
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand" "=v")
  (abs:VI124_AVX2_48_AVX512F
   (match_operand:

Re: patch to improve register preferencing in IRA and to *remove regmove* pass

2013-10-29 Thread Jeff Law

On 10/29/13 10:46, Vladimir Makarov wrote:

On 10/29/2013 12:26 PM, Jeff Law wrote:

On 10/29/13 09:12, Vladimir Makarov wrote:

I've found only one useful transformations in regmove pass:

dst = srcdst = src (src dies)
...  no dst or src modification  =>  src changed on dst
src dies ...

It is some kind value numbering technique decreasing register
pressure by removing one live-range.  It is not frequently
triggered transformation (about 30 in all combine.c) and its
effect is quite small but there is no harm in it at all too.

So I added the code to IRA without changes practically (it would
be nice to make this in more general way, e.g. not only in BB
scope -- but I am not sure that it will makes visible
improvements and I have no time to try this currently).

I wouldn't be concerned about generalizing this too much.  I can't see
how it's *that* important, and we can always come back to it if we
decide it is important.


I am completely agree with you.  I don't expect any important outcome
from this work.  In fact, I did some experiments with GVN about 10 years
ago (how time flies) for old RA and did not see any visible effect on
performance but generated code always changes and may be there are more
opportunity for this (although more probably less opportunities).  In
any case, it is lowest priority project for me and most probably I will
never start it.
I vaguely recall you talking about GVN in the allocator once -- perhaps 
at one of the summits.  And yes, I recall you saying it wasn't 
particularly helpful.


Jeff


Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-29 Thread Uros Bizjak
On Tue, Oct 29, 2013 at 6:18 PM, Cong Hou  wrote:

>>> For the define_expand I added as below, the else body is there to
>>> avoid fall-through transformations to ABS operation in optabs.c.
>>> Otherwise ABS will be converted to other operations even that we have
>>> corresponding instructions from SSSE3.
>>
>> No, it wont be.
>>
>> Fallthrough will generate the pattern that will be matched by the insn
>> pattern above, just like you are doing by hand below.
>
>
> I think the case is special for abs(). In optabs.c, there is a
> function expand_abs() in which the function expand_abs_nojump() is
> called. This function first tries the expand function defined for the
> target and if it fails it will try max(v, -v) then shift-xor-sub
> method. If I don't generate any instruction for SSSE3, the
> fall-through will be max(v, -v). I have tested it on my machine.

Huh, strange.

Then you can rename previous pattern to abs2_1 and call it from
the new expander instead of expanding it manually. Please also add a
small comment, describing the situation to prevent future
"optimizations" in this place.

Thanks,
Uros.


Re: [PATCH] Do not append " *INTERNAL* " to the decl name

2013-10-29 Thread Dehao Chen
On Tue, Oct 29, 2013 at 7:58 AM, Jason Merrill  wrote:
> On 10/28/2013 06:12 PM, Dehao Chen wrote:
>>
>> ping...
>
>
> Sorry for the slow response.
>
> If we're actually emitting the name now, we need to give it a name different
> from the complete constructor.  I suppose it makes sense to go with C4/D4 as
> in the decloning patch,

Shall we do it in a separate patch? And I suppose binutils also need
to be updated for C4/D4?

Thanks,
Dehao

>
> http://gcc.gnu.org/ml/gcc-patches/2007-11/txt00046.txt
>
> (which I've been meaning to revisit before the end of stage 1).
>
> Jason
>


Re: [GOOGLE] Don't update the callee count if caller is not resolved node

2013-10-29 Thread Xinliang David Li
The situation you described is worse -- hopefully it will be addressed
in the next version of lipo.

The change is ok.

David

On Tue, Oct 29, 2013 at 10:08 AM, Dehao Chen  wrote:
> On Mon, Oct 28, 2013 at 9:49 PM, Xinliang David Li  wrote:
>> Is it sufficient to check if the final caller is defined in primary module?
>
> This might not be sufficient because the final caller may come from
> comdat of aux-modules (not defined in the primary module).
>
>>
>> Note that in some cases, doing profile update 'speculatively' (without
>> your fix) can be more precise (assuming the inlining gets materialized
>> in a different compilation), but in general it might be better to be
>> conservative in profile update (as in your patch) -- the downside is
>> you may end up with less aggressive size optimization.
>
> This is actually no speculatively update profile, but double update.
>
> E.g. comdat_foo()-->bar() with count 100
>
> and comdat_foo() has been defined in 2 aux-modules. Then there will be
> two edges from comdat_foo()-->bar(), each of which has the count 100.
> But bar()'s entry count is only 100 (assume comdat_foo is the only
> caller). Then if we update bar() twice when inline these two edges,
> the second update will be wrong.
>
> Dehao
>
>>
>> David
>>
>> On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen  wrote:
>>> This patch changes to no update callee count if caller count is not a
>>> resolved node (in LIPO mode) during AutoFDO compilation. This is
>>> because AutoFDO will have the same edge counts for all unresolved
>>> nodes. Original update method will lead to multi-update of the callee.
>>>
>>> Bootstrapped and testing on going.
>>>
>>> OK for google-4_8 if test is OK?
>>>
>>> Thanks,
>>> Dehao
>>>
>>> Index: gcc/ipa-inline-transform.c
>>> ===
>>> --- gcc/ipa-inline-transform.c (revision 204131)
>>> +++ gcc/ipa-inline-transform.c (working copy)
>>> @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>else
>>>   {
>>>struct cgraph_node *n;
>>> +  if (flag_auto_profile && L_IPO_COMP_MODE
>>> +  && cgraph_pre_profiling_inlining_done)
>>> +{
>>> +  struct cgraph_node *caller = e->caller;
>>> +  if (caller->global.inlined_to)
>>> + caller = caller->global.inlined_to;
>>> +  if (cgraph_lipo_get_resolved_node (caller->symbol.decl) != caller)
>>> + update_original = false;
>>> +}
>>>n = cgraph_clone_node (e->callee, e->callee->symbol.decl,
>>>   e->count, e->frequency,
>>>   update_original, vNULL, true);


Re: [PATCH] tree-ssa documetation fix

2013-10-29 Thread Jeff Law

On 10/29/13 03:53, Martin Liška wrote:

Hello,
   I've noticed that some part of documentation is obsolete and this
patch adds documentation for new functions that replace old macros.

[ ... ]
Thanks.  Installed on the trunk.

jeff


Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-29 Thread Jeff Law

On 10/29/13 09:14, Andrew Pinski wrote:

On Mon, Oct 28, 2013 at 10:51 PM, Jeff Law  wrote:

On 10/27/13 12:55, Andrew Pinski wrote:


Here is my latest patch which adds the testcases from Zhenqiang's
patch and fixes item 1 and 2.

OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
(ifcombine_ifandif): Handle cases where
maybe_fold_and_comparisons fails, combining the branches
anyways.
(tree_ssa_ifcombine): Inverse the order of
the basic block walk, increases the number of combinings.
* gimple.h (gsi_start_nondebug_after_labels_bb): New function.


testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.

* gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
conditional move to be used.
* gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more
intermediate".


I don't be able to look at this in any depth tonight.  However, I would ask
that you pass along the dump file for ssa-dom-thread-3.c so that I can
evaluate the correctness of your change better.


We no longer jump thread one case as we have:

   :
   var_3 = var_1(D)->ssa_name.var;
   _4 = var_1(D)->base.code;
   if (_4 == 1)
 goto ;
   else
 goto ;

...

  :
   _6 = var_3->base.code;
   _13 = _4 != 1;
   _14 = _6 != 0;
   _12 = _13 & _14;
   if (_12 != 0)
 goto ;
   else
 goto ;

Attached is the full dump too.
I mostly wanted to verify whether or not the test was worth running. 
It isn't after this change.  It no longer tests for the threader's 
ability to clone an intermediate block to isolate a path to a successor 
of the intermediate that is threadable.



Please just remove ssa-dom-thread-3.c as part of this checkin.

Thanks,

Jeff



Re: [RFC] [Testsuite,ARM] Neon intrinsics executable tests

2013-10-29 Thread Christophe Lyon
On 29 October 2013 03:24, Ramana Radhakrishnan  wrote:
> On 10/09/13 23:16, Christophe Lyon wrote:

> Irrespective of our earlier conversations on this now I'm actually wondering
> if instead of doing this and integrating this in the GCC source base it
> maybe easier to write a harness to test this cross on qemu or natively.
> Additionally setting up an auto-tester to do this might be a more productive
> use of time rather than manually dejagnuizing this which appears to be a
> tedious and slow process.

This would be easy to setup, since the Makefile on gitorious is
already targetting qemu. I used it occasionnally on boards with
minimal changes.
This just means we'd have to agree on how to set up such an
auto-tester, where do we send the results to, etc...

>> I'd like your feedback before continuing, as there are a lot more
>> files to come.
>>
>> I have made some cleanup to help review, but the two .h files will
>> need to grow as more intrinsics will be added (see the original ones).
>
> Which one should I compare this with in terms of the original file ?

I have kept the same file names.


>> I'd like to keep the modifications at a minimal level, to save my time
>> when adapting each test (there are currently 145 test files, so 143
>> left:-).
>
>
> On to the patch itself.
>
> The prefix TEST_ seems a bit misleading in that it suggests this is testing
> something when in reality this is initializing stuff.
In fact, TEST_ executes the  intrinsics, and copies the
results to memory when relevant. But I can easily change TEST_ to
something else.

So in the sample I posted:

TEST_VABA: VAR=vaba(); vst1(BUFFER,VAR)
TEST_VLD1: VAR=vld1(); vst1(BUFFER, VAR)

VDUP is special in that it is a helper for other tests:
TEST_VDUP: VAR1=vdup(VAR2,)
and similarly for TEST_VLOAD and TEST_VSETLANE

>> +# Exit immediately if this isn't an ARM target.
>> +if ![istarget arm*-*-*] then {
>> +  return
>> +}
>
>
> Also for aarch64*-*-* as all these intrinsics are compatible with the
> aarch64 port. I would also prefer that this be tortured over multiple
> optimization levels as many times we find issues with different optimization
> levels.

OK, this sounds easy to do, and I agree. I prefered to post a simple
version first.
And given you talked me about your plans to factorize arm and aarch64
tests, I thought it was better to start with a simple version I knew
was working.

> More later I need to get back to something else and I need to play more with
> your original testsuite - but I'd like some discussion around some of these
> points anyway.
>
> Ramana

OK thanks for the feedback. If we decide to go with auto-testers
instead, the discussion will probably be shorter.

Christophe

>
>> +
>> +# Load support procs.
>> +load_lib gcc-dg.exp
>> +
>> +# Initialize `dg'.
>> +dg-init
>> +
>> +# Main loop.
>> +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
>> +   ""
>> +
>> +# All done.
>> +dg-finish
>> diff -rNup '--exclude=.git'
>> gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
>> gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
>> --- gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
>> 1970-01-01 01:00:00.0 +0100
>> +++
>> gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
>> 2013-05-09 00:48:59.395628726 +0200
>> @@ -0,0 +1,349 @@
>> +#ifndef_ARM_NEON_REF_H_
>> +#define_ARM_NEON_REF_H_
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define xSTR(X) #X
>> +#define STR(X) xSTR(X)
>> +
>> +#define xNAME1(V,T) V ## _ ##  T
>> +#define xNAME(V,T) xNAME1(V,T)
>> +
>> +#define VAR(V,T,W) xNAME(V,T##W)
>> +#define VAR_DECL(V, T, W) T##W##_t VAR(V,T,W)
>> +
>> +#define VECT_NAME(T, W, N) T##W##x##N
>> +#define VECT_ARRAY_NAME(T, W, N, L) T##W##x##N##x##L
>> +#define VECT_TYPE(T, W, N) xNAME(VECT_NAME(T,W,N),t)
>> +#define VECT_ARRAY_TYPE(T, W, N, L) xNAME(VECT_ARRAY_NAME(T,W,N,L),t)
>> +
>> +#define VECT_VAR(V,T,W,N) xNAME(V,VECT_NAME(T,W,N))
>> +#define VECT_VAR_DECL(V, T, W, N) T##W##_t VECT_VAR(V,T,W,N)
>> +
>> +/* Array declarations.  */
>> +#define ARRAY(V, T, W, N) VECT_VAR_DECL(V,T,W,N)[N]
>> +
>> +/* Check results vs expected values.  */
>> +#define CHECK(MSG,T,W,N,FMT)   \
>> +  for(i=0; i> +{  \
>> +  if (VECT_VAR(result, T, W, N)[i] != VECT_VAR(expected, T, W, N)[i])
>> { \
>> +   fprintf(stderr, \
>> +   "ERROR in %s at type %s index %d: 0x%" FMT " != 0x%"\
>> +   FMT " (expected)\n",\
>> +   MSG, STR(VECT_NAME(T, W, N)), i,\
>> +   VECT_VAR(result, T, W, N)[i],   \
>> +   VECT_VAR(expected, T, W, N)[i]);\
>> +   abort();  

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-29 Thread Cong Hou
On Tue, Oct 29, 2013 at 10:34 AM, Uros Bizjak  wrote:
> On Tue, Oct 29, 2013 at 6:18 PM, Cong Hou  wrote:
>
 For the define_expand I added as below, the else body is there to
 avoid fall-through transformations to ABS operation in optabs.c.
 Otherwise ABS will be converted to other operations even that we have
 corresponding instructions from SSSE3.
>>>
>>> No, it wont be.
>>>
>>> Fallthrough will generate the pattern that will be matched by the insn
>>> pattern above, just like you are doing by hand below.
>>
>>
>> I think the case is special for abs(). In optabs.c, there is a
>> function expand_abs() in which the function expand_abs_nojump() is
>> called. This function first tries the expand function defined for the
>> target and if it fails it will try max(v, -v) then shift-xor-sub
>> method. If I don't generate any instruction for SSSE3, the
>> fall-through will be max(v, -v). I have tested it on my machine.
>
> Huh, strange.
>
> Then you can rename previous pattern to abs2_1 and call it from
> the new expander instead of expanding it manually. Please also add a
> small comment, describing the situation to prevent future
> "optimizations" in this place.

Could you tell me how to do that? The renamed pattern abs2_1 is
also a "define_expand"? How to call this expander?

Thank you!


Cong



>
> Thanks,
> Uros.


[PATCH] Fix PR ipa/58862 (profiled bootstrap failure)

2013-10-29 Thread Teresa Johnson
This patch fixes a profiledbootstrap failure that occurred after I
added some profile fixup. The initial profile insanity occurred
upstream of my change, but my change caused the insanity to spread to
the edge probability, resulting in a verify failure. The patch below
ensures this doesn't occur.

Bootstrapped and tested on x86-64-unknown-linux-gnu. A
profiledbootstrap build also now succeeds. Ok for trunk?

Thanks,
Teresa

2013-10-29  Teresa Johnson  

PR ipa/58862
* tree-ssa-tail-merge.c (replace_block_by): Tolerate profile
insanities when updating probabilities.

Index: tree-ssa-tail-merge.c
===
--- tree-ssa-tail-merge.c   (revision 204166)
+++ tree-ssa-tail-merge.c   (working copy)
@@ -1467,7 +1467,7 @@ static void
 replace_block_by (basic_block bb1, basic_block bb2)
 {
   edge pred_edge;
-  edge e1;
+  edge e1, e2;
   edge_iterator ei;
   unsigned int i;
   gimple bb2_phi;
@@ -1502,16 +1502,22 @@ replace_block_by (basic_block bb1, basic_block bb2
   bb2->count += bb1->count;

   /* Merge the outgoing edge counts from bb1 onto bb2.  */
+  gcov_type out_sum = 0;
   FOR_EACH_EDGE (e1, ei, bb1->succs)
 {
-  edge e2;
   e2 = find_edge (bb2, e1->dest);
   gcc_assert (e2);
   e2->count += e1->count;
-  /* Recompute the probability from the new merged edge count (bb2->count
- was updated above).  */
-  e2->probability = GCOV_COMPUTE_SCALE (e2->count, bb2->count);
+  out_sum += e2->count;
 }
+  /* Recompute the edge probabilities from the new merged edge count.
+ Use the sum of the new merged edge counts computed above instead
+ of bb2's merged count, in case there are profile count insanities
+ making the bb count inconsistent with the edge weights.  */
+  FOR_EACH_EDGE (e2, ei, bb2->succs)
+{
+  e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
+}

   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Keep REG_INC note in subreg2 pass

2013-10-29 Thread Jeff Law

On 10/24/13 02:20, Zhenqiang Chen wrote:

Hi,

REG_INC note is lost in subreg2 pass when resolve_simple_move, which
might lead to wrong dependence for ira. e.g. In function
validate_equiv_mem of ira.c, it checks REG_INC note:

  for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
 if ((REG_NOTE_KIND (note) == REG_INC
  || REG_NOTE_KIND (note) == REG_DEAD)
 && REG_P (XEXP (note, 0))
 && reg_overlap_mentioned_p (XEXP (note, 0), memref))
   return 0;

Without REG_INC note, validate_equiv_mem will return a wrong result.

Referhttps://bugs.launchpad.net/gcc-linaro/+bug/1243022  for more
detail about a real case in kernel.

Bootstrap and no make check regression on X86-64 and ARM.

Is it OK for trunk and 4.8?

Thanks!
-Zhenqiang

ChangeLog:
2013-10-24  Zhenqiang Chen

 * lower-subreg.c (resolve_simple_move): Copy REG_INC note.

testsuite/ChangeLog:
2013-10-24  Zhenqiang Chen

 * gcc.target/arm/lp1243022.c: New test.
This clearly handles adding a note when the destination is a MEM with a 
side effect.  What about cases where the side effect is associated with 
a load from memory rather than a store to memory?






lp1243022.patch


diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
index 57b4b3c..e710fa5 100644
--- a/gcc/lower-subreg.c
+++ b/gcc/lower-subreg.c
@@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn)
mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0);
minsn = emit_move_insn (real_dest, mdest);

+#ifdef AUTO_INC_DEC
+  /* Copy the REG_INC notes.  */
+  if (MEM_P (real_dest) && !(resolve_reg_p (real_dest)
+|| resolve_subreg_p (real_dest)))
+   {
+ rtx note = find_reg_note (insn, REG_INC, NULL_RTX);
+ if (note)
+   {
+ if (!REG_NOTES (minsn))
+   REG_NOTES (minsn) = note;
+ else
+   add_reg_note (minsn, REG_INC, note);
+   }
+   }
+#endif
If MINSN does not have any notes, then this results in MINSN and INSN 
sharing the note.  Note carefully that notes are chained (see 
implementation of add_reg_note).  Thus the sharing would result in MINSN 
and INSN actually sharing a chain of notes.  I'm pretty sure that's not 
what you intended.  I think you need to always use add_reg_note.


Jeff







Symtab, cgraph and varpool nodes are now a real class hierarchy

2013-10-29 Thread David Malcolm
On Mon, 2013-10-28 at 23:37 -0600, Jeff Law wrote:
> On 10/28/13 20:57, David Malcolm wrote:
> >>>   * cgraph.h (symtab_node_base): Convert to a class;
> >>>   add GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"))).
> >>>   (cgraph_node): Inherit from symtab_node; add GTY option
> >>>   tag ("SYMTAB_FUNCTION").
> >>>   (varpool_node): Inherit from symtab_node; add GTY option
> >>>   tag ("SYMTAB_VARIABLE").
> >>>   (symtab_node_def): Remove.
> >>>   (is_a_helper ::test (symtab_node_def *)): Convert to...
> >>>   (is_a_helper ::test (symtab_node_base *)): ...this.
> >>>   (is_a_helper ::test (symtab_node_def *)): Convert to...
> >>>   (is_a_helper ::test (symtab_node_base *)): ...this.
> >>>
> >>>   * ipa-ref.h (symtab_node_def): Drop.
> >>>   (symtab_node): Change underlying type from symtab_node_def to
> >>>   symtab_node_base.
> >>>   (const_symtab_node): Likwise.
> >>>
> >>>   * is-a.h: Update examples in comment.
> >>>
> >>>   * symtab.c (symtab_hash): Change symtab_node_def to symtab_node_base.
> >>>   (assembler_name_hash): Likewise.
> >>
> >> This patch is OK.  Thanks for working on this!
> >
> > These symtab changes were dependent on having gengtype support for
> > inheritance, which is now in trunk, so I'm now revisiting these patches.
> >
> > The above patch hasn't bitrotted, though the autogenerated one that goes
> > with it needed regenerating.
> >
> > A new version of the autogenerated patch can be seen at:
> > http://dmalcolm.fedorapeople.org/gcc/large-patches/eaba9669644c84592ea32be2dcd19ba92beca381-0003-Autogenerated-fixes-of-symbol.-to.patch
> > Is that new patch OK?  (it's 450KB so one's eyes tend to glaze over
> > after a while, but FWIW you approved an earlier version of that in:
> >http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00730.html
> > and the test suite for the script that generated the patch can be seen
> > at:
> > https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/test_refactor_symtab.py
> >  )
> Given it's already approved and hasn't changed in any substantial way, 
> it's OK for the trunk once you get through the bootstrap and regression 
> testing.
Thanks.

The conversion of the symtab types from inheritance-in-C to a C++ class
hierarchy is now in trunk: I committed the manual parts of the
conversion as r204170, and the automated part as r204171.  I then
noticed that this broke the gdb debugging hooks for printing a
(cgraph_node *).  I tested and committed the attached fix for them as
r204174.

As a followup, I'm next looking at the renaming of the symtab types
previously discussed with Honza (on 2013-09-10).

Dave
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 204173)
+++ gcc/ChangeLog	(revision 204174)
@@ -1,3 +1,10 @@
+2013-10-29  David Malcolm  
+
+	* gdbhooks.py (CGraphNodePrinter.to_string): Update gdb
+	prettyprinter for cgraph_node to reflect the conversion of the
+	symtable types to a C++ class hierarchy: it now *is* a
+	symtab_node_base, rather than having one (named "symbol").
+
 2013-10-29  Balaji V. Iyer  
 
 	* builtins.c (is_builtin_name): Added a check for __cilkrts_detach and
Index: gcc/gdbhooks.py
===
--- gcc/gdbhooks.py	(revision 204173)
+++ gcc/gdbhooks.py	(revision 204174)
@@ -226,8 +226,7 @@
 # symtab_node_name calls lang_hooks.decl_printable_name
 # default implementation (lhd_decl_printable_name) is:
 #return IDENTIFIER_POINTER (DECL_NAME (decl));
-symbol = self.gdbval['symbol']
-tree_decl = Tree(symbol['decl'])
+tree_decl = Tree(self.gdbval['decl'])
 result += ' "%s"' % tree_decl.DECL_NAME().IDENTIFIER_POINTER()
 result += '>'
 return result


Re: free is a killer

2013-10-29 Thread Jeff Law

On 10/29/13 00:38, Marc Glisse wrote:

On Mon, 28 Oct 2013, Jeff Law wrote:


On 10/28/13 16:05, Marc Glisse wrote:


I checked and it does the wrong thing (I don't have the testcase handy
anymore, but it shouldn't be hard to recreate one), I even wrote a patch
(attached) but it is related to:
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html
so it can't go in. A more limited fix (still missing many cases) would
be possible. I didn't test VA_END, but it doesn't look as bad (it is the
SSA_NAME case that is wrong for memcpy

Actually, it's fairly easy to see with memset.  Something like this:

/* { dg-do compile } */
/* { dg-options "-O1 -fdump-tree-optimized" } */

void f (long *p) {
 *p = 42;
 p[4] = 42;
 __builtin_memset (p, 0, 100);
}

/* { dg-final { scan-tree-dump-not "= 42" "optimized" } } */
/* { dg-final { cleanup-tree-dump "optimized" } } */


Indeed. Do you want to commit it xfailed or put it in bugzilla so we
don't lose it? (it becomes harder if you replace p with p-1 in the
memset arguments).
BTW, you can get a VAR_DECL (and presumably a PARM_DECL) from 
ao_ref_base.  Some of the ssa-dse tests show this.


Jeff


Re: Symtab, cgraph and varpool nodes are now a real class hierarchy

2013-10-29 Thread Jan Hubicka
> 
> The conversion of the symtab types from inheritance-in-C to a C++ class
> hierarchy is now in trunk: I committed the manual parts of the
> conversion as r204170, and the automated part as r204171.  I then
> noticed that this broke the gdb debugging hooks for printing a
> (cgraph_node *).  I tested and committed the attached fix for them as
> r204174.
> 
> As a followup, I'm next looking at the renaming of the symtab types
> previously discussed with Honza (on 2013-09-10).

Thank you, David!

Honza
> 
> Dave

> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog (revision 204173)
> +++ gcc/ChangeLog (revision 204174)
> @@ -1,3 +1,10 @@
> +2013-10-29  David Malcolm  
> +
> + * gdbhooks.py (CGraphNodePrinter.to_string): Update gdb
> + prettyprinter for cgraph_node to reflect the conversion of the
> + symtable types to a C++ class hierarchy: it now *is* a
> + symtab_node_base, rather than having one (named "symbol").
> +
>  2013-10-29  Balaji V. Iyer  
>  
>   * builtins.c (is_builtin_name): Added a check for __cilkrts_detach and
> Index: gcc/gdbhooks.py
> ===
> --- gcc/gdbhooks.py   (revision 204173)
> +++ gcc/gdbhooks.py   (revision 204174)
> @@ -226,8 +226,7 @@
>  # symtab_node_name calls lang_hooks.decl_printable_name
>  # default implementation (lhd_decl_printable_name) is:
>  #return IDENTIFIER_POINTER (DECL_NAME (decl));
> -symbol = self.gdbval['symbol']
> -tree_decl = Tree(symbol['decl'])
> +tree_decl = Tree(self.gdbval['decl'])
>  result += ' "%s"' % tree_decl.DECL_NAME().IDENTIFIER_POINTER()
>  result += '>'
>  return result



Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins

2013-10-29 Thread Jeff Law

On 10/29/13 07:52, Ilya Enkovich wrote:


Yeah.  I'm working on it right now.  I've fixed known issues and now
I'm looking for others.  Meanwhile here is a new patch version with
required renames and without LTO restriction.
I can't help but but curious, what turned out to be the root cause of 
those LTO problems?



2013-10-29  Ilya Enkovich  

* builtin-types.def (BT_FN_VOID_CONST_PTR): New.
(BT_FN_PTR_CONST_PTR): New.
(BT_FN_CONST_PTR_CONST_PTR): New.
(BT_FN_PTR_CONST_PTR_SIZE): New.
(BT_FN_PTR_CONST_PTR_CONST_PTR): New.
(BT_FN_VOID_PTRPTR_CONST_PTR): New.
(BT_FN_VOID_CONST_PTR_SIZE): New.
(BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New.
* chkp-builtins.def: New.
* builtins.def: include chkp-builtins.def.
(DEF_CHKP_BUILTIN): New.
* builtins.c (expand_builtin): Support BUILT_IN_CHKP_INIT_PTR_BOUNDS,
BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS,
BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS,
BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS,
BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS,
BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND,
BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL,
BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET,
BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND, BUILT_IN_CHKP_NARROW,
BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER.
* common.opt (fcheck-pointer-bounds): New.
* toplev.c (process_options): Check Pointer Bounds Checker is supported.
* doc/extend.texi: Document Pointer Bounds Checker built-in functions.

This is fine.  Please install.

Thanks,
jeff


  1   2   >