Re: [PATCH] Fix PR54826

2012-10-06 Thread Jakub Jelinek
On Fri, Oct 05, 2012 at 07:24:53PM -0700, Dehao Chen wrote:
> This patch fixes PR54826. When lowering the gimple, the block for call
> arg also need to be reset.
> 
> Bootstrapped and passed gcc regression test on x86.
> 
> Okay for trunk?
> 
> Thanks,
> Dehao
> 
> 2012-10-05  Dehao Chen  
> 
> * gimple-low.c (lower_stmt): Set the block for call args.

Yes, thanks.

> --- gcc/gimple-low.c  (revision 192147)
> +++ gcc/gimple-low.c  (working copy)
> @@ -425,7 +425,15 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lowe
>  case GIMPLE_CALL:
>{
>   tree decl = gimple_call_fndecl (stmt);
> + unsigned i;
>  
> + for (i = 0; i < gimple_call_num_args (stmt); i++)
> +   {
> + tree arg = gimple_call_arg (stmt, i);
> + if (EXPR_P (arg))
> +   TREE_SET_BLOCK (arg, data->block);
> +   }
> +
>   if (decl
>   && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
>   && DECL_FUNCTION_CODE (decl) == BUILT_IN_SETJMP)


Jakub


Re: [RFC] Make vectorizer to skip loops with small iteration estimate

2012-10-06 Thread Jan Hubicka
Hi,
I benchmarked the patch moving loop header copying and it is quite noticeable 
win.

Some testsuite updating is needed. In many cases it is just because the
optimizations are now happening earlier.
There are however few testusite failures I have torubles to deal with:
./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/pr21559.c scan-tree-dump-times 
vrp1 "Threaded jump" 3
./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2.c 
scan-tree-dump-times vrp1 "Jumps threaded: 1" 1
./testsuite/gcc/gcc.sum:FAIL: gcc.dg/vect/O3-slp-reduc-10.c 
scan-tree-dump-times vect "vectorized 1 loops" 2
./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++98  
scan-tree-dump-times vrp1 "if " 1
./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++11  
scan-tree-dump-times vrp1 "if " 1

This is mostly about VRP losing its ability to thread some jumps from the
duplicated loop header out of the loop across the loopback edge.  This seems to
be due to loop updating logic.  Do we care about these?

Honza

Index: tree-ssa-threadupdate.c
===
*** tree-ssa-threadupdate.c (revision 192123)
--- tree-ssa-threadupdate.c (working copy)
*** static bool
*** 846,854 
  def_split_header_continue_p (const_basic_block bb, const void *data)
  {
const_basic_block new_header = (const_basic_block) data;
!   return (bb != new_header
! && (loop_depth (bb->loop_father)
! >= loop_depth (new_header->loop_father)));
  }
  
  /* Thread jumps through the header of LOOP.  Returns true if cfg changes.
--- 846,860 
  def_split_header_continue_p (const_basic_block bb, const void *data)
  {
const_basic_block new_header = (const_basic_block) data;
!   const struct loop *l;
! 
!   if (bb == new_header
!   || loop_depth (bb->loop_father) < loop_depth (new_header->loop_father))
! return false;
!   for (l = bb->loop_father; l; l = loop_outer (l))
! if (l == new_header->loop_father)
!   return true;
!   return false;
  }
  
  /* Thread jumps through the header of LOOP.  Returns true if cfg changes.
Index: testsuite/gcc.dg/unroll_2.c
===
*** testsuite/gcc.dg/unroll_2.c (revision 192123)
--- testsuite/gcc.dg/unroll_2.c (working copy)
***
*** 1,5 
  /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
-fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
-fenable-rtl-loop2_unroll" } */
  
  unsigned a[100], b[100];
  inline void bar()
--- 1,5 
  /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
-fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
-fenable-rtl-loop2_unroll -fno-tree-dominator-opts" } */
  
  unsigned a[100], b[100];
  inline void bar()
Index: testsuite/gcc.dg/unroll_3.c
===
*** testsuite/gcc.dg/unroll_3.c (revision 192123)
--- testsuite/gcc.dg/unroll_3.c (working copy)
***
*** 1,5 
  /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
-fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } 
*/
  
  unsigned a[100], b[100];
  inline void bar()
--- 1,5 
  /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
-fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo 
-fno-tree-dominator-opts" } */
  
  unsigned a[100], b[100];
  inline void bar()
Index: testsuite/gcc.dg/torture/pr23821.c
===
*** testsuite/gcc.dg/torture/pr23821.c  (revision 192123)
--- testsuite/gcc.dg/torture/pr23821.c  (working copy)
***
*** 1,9 
  /* { dg-do compile } */
  /* { dg-skip-if "" { *-*-* } { "-O0" "-fno-fat-lto-objects" } { "" } } */
! /* At -O1 DOM threads a jump in a non-optimal way which leads to
 the bogus propagation.  */
! /* { dg-skip-if "" { *-*-* } { "-O1" } { "" } } */
! /* { dg-options "-fdump-tree-ivcanon-details" } */
  
  int a[199];
  
--- 1,8 
  /* { dg-do compile } */
  /* { dg-skip-if "" { *-*-* } { "-O0" "-fno-fat-lto-objects" } { "" } } */
! /* DOM threads a jump in a non-optimal way which leads to
 the bogus propagation.  */
! /* { dg-options "-fdump-tree-ivcanon-details -fno-tree-dominator-opts" } */
  
  int a[199];
  
Index: testsuite/gcc.dg/tree-ssa/ivopt_1.c
===
*** testsuite/gcc.dg/tree-ssa/ivopt_1.c (revision 192123)
--- testsuite/gcc.dg/tree-ssa/ivopt_1.c (working copy)
*** void foo (int i_width, TYPE dst, TYPE sr
*** 14,18 
  }
  
  
! /* { dg-final { scan-tree-dump-times "PHI 

[patch, fortran] PR 54833 Don't wrap calls to free(a) in if (a != NULL)

2012-10-06 Thread Thomas Koenig

Hello world,

the attached patch removes wrapping calls to free(a) by
if (a != NULL) for some cases.  It is not complete, because
automatic deallocation of allocatable structure components
is not yet covered.

OK for trunk?

Thomas

2012-10-06  Thomas König  

PR fortran/54833
* trans.c (gfc_call_free):  Do not wrap the
call to __builtin_free in check for NULL.
(gfc_deallocate_with_status):  For automatic
deallocation without status, don't wrap call
to __builtin_free in check for NULL.

2012-10-06  Thomas König  

PR fortran/54833
* gfortran.dg/auto_dealloc_3.f90:  New test.
Index: trans.c
===
--- trans.c	(Revision 191857)
+++ trans.c	(Arbeitskopie)
@@ -814,26 +814,23 @@ gfc_allocate_allocatable (stmtblock_t * block, tre
 }
 
 
-/* Free a given variable, if it's not NULL.  */
+/* Free a given variable.  If it is NULL, free takes care of this
+   automatically.  */
 tree
 gfc_call_free (tree var)
 {
   stmtblock_t block;
-  tree tmp, cond, call;
+  tree call;
 
   if (TREE_TYPE (var) != TREE_TYPE (pvoid_type_node))
 var = fold_convert (pvoid_type_node, var);
 
   gfc_start_block (&block);
   var = gfc_evaluate_now (var, &block);
-  cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node, var,
-			  build_int_cst (pvoid_type_node, 0));
   call = build_call_expr_loc (input_location,
 			  builtin_decl_explicit (BUILT_IN_FREE),
 			  1, var);
-  tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, cond, call,
-			 build_empty_stmt (input_location));
-  gfc_add_expr_to_block (&block, tmp);
+  gfc_add_expr_to_block (&block, call);
 
   return gfc_finish_block (&block);
 }
@@ -861,11 +858,10 @@ gfc_call_free (tree var)
 	}
 }
 
-   In this front-end version, status doesn't have to be GFC_INTEGER_4.
-   Moreover, if CAN_FAIL is true, then we will not emit a runtime error,
-   even when no status variable is passed to us (this is used for
-   unconditional deallocation generated by the front-end at end of
-   each procedure).
+   In this front-end version, status doesn't have to be GFC_INTEGER_4.  If
+   CAN_FAIL is true and no status variable is passed, we will simply call
+   free(). This is used for unconditional deallocation generated by the
+   front-end at end of each procedure.

If a runtime-message is possible, `expr' must point to the original
expression being deallocated for its locus and variable name.
@@ -890,6 +886,14 @@ gfc_deallocate_with_status (tree pointer, tree sta
   STRIP_NOPS (pointer);
 }
 
+  if (can_fail && status == NULL_TREE)
+{
+  tmp = build_call_expr_loc (input_location,
+ builtin_decl_explicit (BUILT_IN_FREE), 1,
+ fold_convert (pvoid_type_node, pointer));
+  return tmp;
+}
+
   cond = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node, pointer,
 			  build_int_cst (TREE_TYPE (pointer), 0));
 
! { dg-do compile }
! { dg-options "-fdump-tree-original" }
! PR fortran/54833
! Make sure we don't wrap a free() in an if(a.data != NULL) by
! counting the ifs.
subroutine foo
  real, allocatable :: a(:)
  allocate(a(10))
end subroutine foo
! { dg-final { scan-tree-dump-times "if" 3 "original" } }
! { dg-final { cleanup-tree-dump "original" } }



RFA: Simplifying truncation and integer lowpart subregs

2012-10-06 Thread Richard Sandiford
[cc:ing sh, spu and tilegx maintainers]

Richard Sandiford  writes:
> Andrew Pinski  writes:
>> On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak  wrote:
>>> 2012-09-27  Uros Bizjak  
>>>
>>> PR rtl-optimization/54457
>>> * simplify-rtx.c (simplify_subreg):
>>> Simplify (subreg:M (op:N ((x:N) (y:N)), 0)
>>> to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where
>>> the outer subreg is effectively a truncation to the original mode M.
>>
>>
>> When I was doing something similar on our internal toolchain at
>> Cavium.  I found doing this caused a regression on MIPS64 n32 in
>> gcc.c-torture/execute/20040709-1.c Where:
>>
>>
>> (insn 15 14 16 2 (set (reg/v:DI 200 [ y ])
>> (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit}
>>  (expr_list:REG_DEAD (reg:DI 2 $2)
>> (nil)))
>>
>> (insn 16 15 17 2 (set (reg:DI 210)
>> (zero_extract:DI (reg/v:DI 200 [ y ])
>> (const_int 29 [0x1d])
>> (const_int 0 [0]))) t.c:16 249 {extzvdi}
>>  (expr_list:REG_DEAD (reg/v:DI 200 [ y ])
>> (nil)))
>>
>> (insn 17 16 23 2 (set (reg:SI 211)
>> (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2}
>>  (expr_list:REG_DEAD (reg:DI 210)
>> (nil)))
>>
>> Gets converted to:
>> (insn 23 17 26 2 (set (reg/i:SI 2 $2)
>> (and:SI (reg:SI 2 $2 [+4 ])
>> (const_int 536870911 [0x1fff]))) t.c:18 156 {*andsi3}
>>  (nil))
>>
>> Which is considered an ext instruction
>>
>> And with the Octeon simulator which causes undefined arguments to
>> 32bit word operations to come out as 0xDEADBEEF which showed the
>> regression.  I fixed it by changing it to produce TRUNCATE instead of
>> the subreg.
>>
>> I did the simplification on ior/and rather than plus/minus/mult so the
>> issue is only when expanding to this to and/ior.
>
> Hmm, hadn't thought of that.  I think some of the existing subreg
> optimisations suffer the same problem.  I.e. we can't assume that
> subreg truncations of nested operands are OK just because the outer
> subreg is OK.
>
> I've got a patch I'm testing.

The idea is to split most of the lowpart subreg handling out of
simplify_subreg and apply it to TRUNCATE too.  There are three reasons:

- I wanted to make the !TRULY_NOOP_TRUNCATION truncation simplifications
  as similar to subreg truncation simplifications as possible.

- Some of the current lowpart subreg simplifications are also correct
  for vector truncations.

- Ideally, using simplify_gen_unary (TRUNCATE, ...) instead of
  simplify_gen_subreg shouldn't penalise TRULY_NOOP_TRUNCATION targets.
  There is already code to use gen_lowpart_no_emit for truncations that
  reduce to subregs, but as things stand, gen_lowpart_no_emit only
  passes objects like SUBREG, REG, MEM, etc., to simplify_gen_subreg;
  others go through gen_lowpart_SUBREG and get no recursive simplification.

We inherited this code from a 1996 patch (r13058):

  if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
   ? (num_sign_bit_copies (op, GET_MODE (op))
  > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op))
- GET_MODE_PRECISION (mode)))
...
return rtl_hooks.gen_lowpart_no_emit (mode, op);

I don't see any reason for the sign-bit check.  If truncations are noops,
we should be able to use a subreg regardless.

Other than removing that check, the patch just moves simplifications around.
I've not tried to match new patterns.

The other !TRULY_NOOP_TRUNCATION targets are sh64, spu and tilegx.
I don't think sh64 has any patterns that would be adversely affected,
although the patch ought to make these patterns redundant:

(define_insn_and_split "*logical_sidisi3"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
(truncate:SI (sign_extend:DI
(match_operator:SI 3 "logical_operator"
  [(match_operand:SI 1 "arith_reg_operand" "%r,r")
   (match_operand:SI 2 "logical_operand" "r,I10")]]
  "TARGET_SHMEDIA"
  "#"
  "&& 1"
  [(set (match_dup 0) (match_dup 3))])

(define_insn_and_split "*logical_sidi3_2"
  [(set (match_operand:DI 0 "arith_reg_dest" "=r,r")
(sign_extend:DI (truncate:SI (sign_extend:DI
(match_operator:SI 3 "logical_operator"
  [(match_operand:SI 1 "arith_reg_operand" "%r,r")
   (match_operand:SI 2 "logical_operand" "r,I10")])]
  "TARGET_SHMEDIA"
  "#"
  "&& 1"
  [(set (match_dup 0) (sign_extend:DI (match_dup 3)))])

combine should now simplify the first to the normal SI logical op
and the second to *logical_sidisi3.  I don't think any spu or tilegx
patterns are affected either way.

Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf.  Also tested
by making sure that there were no code differences for a set of gcc .ii
files on gcc20 (-O2 -march=native).  OK to install?

Richard


gcc/
* machmode.h (GET_MODE_UNIT_PRECISIO

Re: [Patch, Fortran, F08] PR 45521: GENERIC resolution with ALLOCATABLE/POINTER and PROCEDURE

2012-10-06 Thread Mikael Morin
Hello,

Le 04/10/2012 00:06, Janus Weil a écrit :
> Hi all,
> 
> the attached patch implements an F08 feature, which allows to
> distinguish two specific procedures in a generic interface, based on
> the POINTER and ALLOCATABLE attribute of their arguments.
> 
> In addition to this, the patch fixes a bug in rejecting data actual
> arguments passed to procedure formal arguments.
> 
> The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk?
> 
Looks good, yes. Thanks.

Mikael


Re: [Patch,avr]: Remove -mshort-calls option

2012-10-06 Thread Denis Chertykov
2012/10/5 Georg-Johann Lay :
> As already discussed, this patch removes the -mshort-calls command option from
> avr-gcc.
>
> Ok to apply?
>
> If the change is on order, changes to wwwdocs will follow, i.e. deprecate the
> option in 4.7 and tell it is removed in the 4.8 caveats.
>
> Johann
>
>
> * doc/invoke.texi (AVR Options): Remove -mshort-calls.
> * config/avr/avr.opt (-mshort-calls): Remove option.
> * config/avr/avr.h (AVR_HAVE_JMP_CALL): Don't depend on
> TARGET_SHORT_CALLS.
>

Ok. Approved.

Denis.


Re: RFA: Simplifying truncation and integer lowpart subregs

2012-10-06 Thread Eric Botcazou
> Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf.  Also tested
> by making sure that there were no code differences for a set of gcc .ii
> files on gcc20 (-O2 -march=native).  OK to install?

Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is 
always correct?

> gcc/
>   * machmode.h (GET_MODE_UNIT_PRECISION): New macro.
>   * simplify-rtx.c (simplify_truncation): New function.

You should say where it comes from.

>   (simplify_unary_operation_1): Use it.  Remove sign bit test
>   for !TRULY_NOOP_TRUNCATION_MODES_P.

(simplify_unary_operation_1) : ...

>   (simplify_subreg): Use simplify_int_lowpart for TRUNCATE.

This function doesn't exist.  And this is misleading, it's not just for 
TRUNCATE, it's for a truncation to the lowpart.

>  /* Try to simplify a unary operation CODE whose output mode is to be
> MODE with input operand OP whose mode was originally OP_MODE.
> Return zero if no simplification can be made.  */
> @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod
>   op_mode = mode;
> in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode);
> 
> -   if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT)
> - {
> -   rtx tem = in2;
> -   in2 = in1; in1 = tem;
> - }
> -
> return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR,
>mode, in1, in2);
>   }

Why is this hunk here?

> @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer
>return NULL_RTX;
>  }
> 
> -  /* Merge implicit and explicit truncations.  */
> -
> -  if (GET_CODE (op) == TRUNCATE
> -  && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
> -  && subreg_lowpart_offset (outermode, innermode) == byte)
> -return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
> -GET_MODE (XEXP (op, 0)));
> -
>/* SUBREG of a hard register => just change the register number
>   and/or mode.  If the hard register is not valid in that mode,
>   suppress this simplification.  If the hard register is the stack,

Likewise.

-- 
Eric Botcazou


[SH] PR 54685 - unsigned int comparison with 0x7FFFFFFF

2012-10-06 Thread Oleg Endo
Hello,

The attached patch improves comparisons such as
'unsigned int <= 0x7FFF' on SH.
As mentioned in the PR, for some reason, those comparisons do not go
through the cstore expander.  As a consequence the comparison doesn't
get the chance to be canonicalized by the target code and ends up as
'(~x) >> 31'.
I've not investigated this further and just fixed the symptoms on SH.  I
don't know whether it's also an issue on other targets.

Tested on rev 192142 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/54685
* config/sh/sh.md (one_cmplsi2): Make insn_and_split.  Add 
manual combine matching for an insn sequence where a ge:SI
pattern can be used.

testsuite/ChangeLog:

PR target/54685
* gcc.target/sh/pr54685.c: New.
Index: gcc/testsuite/gcc.target/sh/pr54685.c
===
--- gcc/testsuite/gcc.target/sh/pr54685.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54685.c	(revision 0)
@@ -0,0 +1,58 @@
+/* Check that a comparison 'unsigned int <= 0x7FFF' results in code
+   utilizing the cmp/pz instruction.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } }  */
+/* { dg-final { scan-assembler-not "not" } } */
+/* { dg-final { scan-assembler-times "cmp/pz" 7 } } */
+/* { dg-final { scan-assembler-times "shll" 1 } } */
+/* { dg-final { scan-assembler-times "movt" 4 } } */
+
+int
+test_00 (unsigned int a)
+{
+  return !(a > 0x7FFF);
+}
+
+int
+test_01 (unsigned int a)
+{
+  return !(a > 0x7FFF) ? -5 : 10;
+}
+
+int
+test_02 (unsigned int a)
+{
+  /* 1x shll, 1x movt  */
+  return a >= 0x8000;
+}
+
+int
+test_03 (unsigned int a)
+{
+  return a >= 0x8000 ? -5 : 10;
+}
+
+int
+test_04 (unsigned int a)
+{
+  return a <= 0x7FFF;
+}
+
+int
+test_05 (unsigned int a)
+{
+  return a <= 0x7FFF ? -5 : 10;
+}
+
+int
+test_06 (unsigned int a)
+{
+  return a < 0x8000;
+}
+
+int
+test_07 (unsigned int a)
+{
+  return a < 0x8000 ? -5 : 10;
+}
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192142)
+++ gcc/config/sh/sh.md	(working copy)
@@ -5188,11 +5188,61 @@
   "neg	%1,%0"
   [(set_attr "type" "arith")])
 
-(define_insn "one_cmplsi2"
+(define_insn_and_split "one_cmplsi2"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r")
 	(not:SI (match_operand:SI 1 "arith_reg_operand" "r")))]
   "TARGET_SH1"
   "not	%1,%0"
+  "&& can_create_pseudo_p ()"
+  [(set (reg:SI T_REG) (ge:SI (match_dup 1) (const_int 0)))
+   (set (match_dup 0) (reg:SI T_REG))]
+{
+/* PR 54685
+   If the result of 'unsigned int <= 0x7FFF' ends up as the following
+   sequence:
+
+ (set (reg0) (not:SI (reg0) (reg1)))
+ (parallel [(set (reg2) (lshiftrt:SI (reg0) (const_int 31)))
+		(clobber (reg:SI T_REG))])
+
+   ... match and combine the sequence manually in the split pass after the
+   combine pass.  Notice that combine does try the target pattern of this
+   split, but if the pattern is added it interferes with other patterns, in
+   particular with the div0s comparisons.
+   This could also be done with a peephole but doing it here before register
+   allocation can save one temporary.
+   When we're here, the not:SI pattern obviously has been matched already
+   and we only have to see whether the following insn is the left shift.  */
+
+  rtx i = next_nonnote_insn_bb (curr_insn);
+  if (i == NULL_RTX || !NONJUMP_INSN_P (i))
+FAIL;
+
+  rtx p = PATTERN (i);
+  if (GET_CODE (p) != PARALLEL || XVECLEN (p, 0) != 2)
+FAIL;
+
+  rtx p0 = XVECEXP (p, 0, 0);
+  rtx p1 = XVECEXP (p, 0, 1);
+
+  if (/* (set (reg2) (lshiftrt:SI (reg0) (const_int 31)))  */
+  GET_CODE (p0) == SET
+  && GET_CODE (XEXP (p0, 1)) == LSHIFTRT
+  && REG_P (XEXP (XEXP (p0, 1), 0))
+  && REGNO (XEXP (XEXP (p0, 1), 0)) == REGNO (operands[0])
+  && CONST_INT_P (XEXP (XEXP (p0, 1), 1))
+  && INTVAL (XEXP (XEXP (p0, 1), 1)) == 31
+
+  /* (clobber (reg:SI T_REG))  */
+  && GET_CODE (p1) == CLOBBER && REG_P (XEXP (p1, 0))
+  && REGNO (XEXP (p1, 0)) == T_REG)
+{
+  operands[0] = XEXP (p0, 0);
+  set_insn_deleted (i);
+}
+  else
+FAIL;
+}
   [(set_attr "type" "arith")])
 
 (define_expand "one_cmpldi2"


Re: Propagate profile counts during switch expansion

2012-10-06 Thread Jan Hubicka
> >
> >>
> >> > +
> >> > +  default_edge->count = default_count;
> >> > +  if (count)
> >> > +{
> >> > +  edge e;
> >> > +  edge_iterator ei;
> >> > +  FOR_EACH_EDGE (e, ei, stmt_bb->succs)
> >> > +e->probability = e->count * REG_BR_PROB_BASE / count;
> >> > +}
> >>
> >> Hmm, this updates origina BB containing the switch statement?
> > The out_edges of the original BB.
> >> Of course,
> >> modulo roundoff errors, this should hold.  I wonder where did you got the
> >> diferences and why do you need this?
> 
> The count of the outgoing edge of BB that jumps to the default label
> needs to be updated (to 0 or original_default_count/2, depending on
> whether default label is a jump table target). So I need to
> redistribute the probabilities of the rest of the edges. That's why I
> do this here.

Hmm, I see, OK then.  Please add a comment.

Honza
> 
> > Originally, I obtained e->probability as e->count * REG_BR_PROB_BASE /
> > bb->count. During profile bootstrap, I noticed BBs whose sum of
> > outgoing edge counts exceeded the BB's count and hence I normalized
> > with the sum of edge counts.
> >
> >> You are going to output new control flow
> >> and find_many_sub_basic_blocks will recompute all the counts/frequencies 
> >> inside
> >> anyway?
> > Sorry, I am lost here. I am just updating the probability of
> > stmt_bb->succs right?
> >
> >


Re: [patch, fortran] PR 54833 Don't wrap calls to free(a) in if (a != NULL)

2012-10-06 Thread Thomas Koenig

I wrote:


the attached patch removes wrapping calls to free(a) by
if (a != NULL) for some cases.  It is not complete, because
automatic deallocation of allocatable structure components
is not yet covered.


I accidentally posted an old version, which had a bug in
coarrays (basically was just missing an "else").  Regression-tested.

OK for trunk?

Thomas

2012-10-06  Thomas König  

PR fortran/54833
* trans.c (gfc_call_free):  Do not wrap the
call to __builtin_free in check for NULL.
(gfc_deallocate_with_status): For automatic deallocation without
status for non-coarrays, don't wrap call to __builtin_free in
check for NULL.

2012-10-06  Thomas König  

PR fortran/54833
* gfortran.dg/auto_dealloc_3.f90:  New test.

Index: trans.c
===
--- trans.c	(Revision 191857)
+++ trans.c	(Arbeitskopie)
@@ -814,26 +814,23 @@ gfc_allocate_allocatable (stmtblock_t * block, tre
 }
 
 
-/* Free a given variable, if it's not NULL.  */
+/* Free a given variable.  If it is NULL, free takes care of this
+   automatically.  */
 tree
 gfc_call_free (tree var)
 {
   stmtblock_t block;
-  tree tmp, cond, call;
+  tree call;
 
   if (TREE_TYPE (var) != TREE_TYPE (pvoid_type_node))
 var = fold_convert (pvoid_type_node, var);
 
   gfc_start_block (&block);
   var = gfc_evaluate_now (var, &block);
-  cond = fold_build2_loc (input_location, NE_EXPR, boolean_type_node, var,
-			  build_int_cst (pvoid_type_node, 0));
   call = build_call_expr_loc (input_location,
 			  builtin_decl_explicit (BUILT_IN_FREE),
 			  1, var);
-  tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, cond, call,
-			 build_empty_stmt (input_location));
-  gfc_add_expr_to_block (&block, tmp);
+  gfc_add_expr_to_block (&block, call);
 
   return gfc_finish_block (&block);
 }
@@ -861,11 +858,10 @@ gfc_call_free (tree var)
 	}
 }
 
-   In this front-end version, status doesn't have to be GFC_INTEGER_4.
-   Moreover, if CAN_FAIL is true, then we will not emit a runtime error,
-   even when no status variable is passed to us (this is used for
-   unconditional deallocation generated by the front-end at end of
-   each procedure).
+   In this front-end version, status doesn't have to be GFC_INTEGER_4.  If
+   CAN_FAIL is true, no status variable is passed and we are not dealing with
+   a coarray, we will simply call free().  This is used for unconditional
+   deallocation generated by the front-end at end of each procedure.

If a runtime-message is possible, `expr' must point to the original
expression being deallocated for its locus and variable name.
@@ -890,6 +886,14 @@ gfc_deallocate_with_status (tree pointer, tree sta
   STRIP_NOPS (pointer);
 }
 
+  else if (can_fail && status == NULL_TREE)
+{
+  tmp = build_call_expr_loc (input_location,
+ builtin_decl_explicit (BUILT_IN_FREE), 1,
+ fold_convert (pvoid_type_node, pointer));
+  return tmp;
+}
+
   cond = fold_build2_loc (input_location, EQ_EXPR, boolean_type_node, pointer,
 			  build_int_cst (TREE_TYPE (pointer), 0));
 
! { dg-do compile }
! { dg-options "-fdump-tree-original" }
! PR fortran/54833
! Make sure we don't wrap a free() in an if(a.data != NULL) by
! counting the ifs.
subroutine foo
  real, allocatable :: a(:)
  allocate(a(10))
end subroutine foo
! { dg-final { scan-tree-dump-times "if" 3 "original" } }
! { dg-final { cleanup-tree-dump "original" } }



*ping* [patch, libfortran] Fix PR 54736, memory corruption with GFORTRAN_CONVERT_UNIT

2012-10-06 Thread Thomas Koenig

Am 01.10.2012 20:34, schrieb Thomas Koenig:

Hello world,

the previous version of the patch has an issue that Shane pointed
out in the PR.  This version should work; at least it survived
all the test cases I could come up with.

Regression-tested (again).  OK for trunk?  Also for 4.6 and 4.7?


Ping?

I would like to start committing patches so I don't have too many
of them in my tree at the same time :-)

Thomas



[m68k] Remove anddi3, iordi3, xordi3, one_cmpldi2 patterns

2012-10-06 Thread Andreas Schwab
Letting the DImode logical ops being split by lower subreg tends to
produce better code.  Tested on m68k-linux and committed.

Andreas.

PR rtl-optimization/54739
* config/m68k/m68k.md (anddi3, iordi3, xordi3, one_cmpldi2):
Remove.

Index: config/m68k/m68k.md
===
--- config/m68k/m68k.md (revision 192155)
+++ config/m68k/m68k.md (working copy)
@@ -3597,72 +3597,6 @@
 
 ;; logical-and instructions
 
-;; "anddi3" is mainly here to help combine().
-(define_insn "anddi3"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=o,d")
-   (and:DI (match_operand:DI 1 "general_operand" "%0,0")
-   (match_operand:DI 2 "general_operand" "dn,don")))]
-  "!TARGET_COLDFIRE"
-{
-  CC_STATUS_INIT;
-  /* We can get CONST_DOUBLE, but also const1_rtx etc.  */
-  if (CONSTANT_P (operands[2]))
-{
-  rtx hi, lo;
-
-  split_double (operands[2], &hi, &lo);
-
-  switch (INTVAL (hi))
-   {
- case 0 :
-   output_asm_insn ("clr%.l %0", operands);
-   break;
- case -1 :
-   break;
- default :
-   {
-   rtx xoperands[3];
-
-   xoperands[0] = operands[0];
-   xoperands[2] = hi;
-   output_asm_insn (output_andsi3 (xoperands), xoperands);
-   }
-   }
-  if (GET_CODE (operands[0]) == REG)
-   operands[0] = gen_rtx_REG (SImode, REGNO (operands[0]) + 1);
-  else
-   operands[0] = adjust_address (operands[0], SImode, 4);
-  switch (INTVAL (lo))
-   {
- case 0 :
-   output_asm_insn ("clr%.l %0", operands);
-   break;
- case -1 :
-   break;
- default :
-   {
-   rtx xoperands[3];
-
-   xoperands[0] = operands[0];
-   xoperands[2] = lo;
-   output_asm_insn (output_andsi3 (xoperands), xoperands);
-   }
-   }
-  return "";
-}
-  if (GET_CODE (operands[0]) != REG)
-{
-  operands[1] = adjust_address (operands[0], SImode, 4);
-  return "and%.l %2,%0\;and%.l %R2,%1";
-}
-  if (GET_CODE (operands[2]) != REG)
-{
-  operands[1] = adjust_address (operands[2], SImode, 4);
-  return "and%.l %2,%0\;and%.l %1,%R0";
-}
-  return "and%.l %2,%0\;and%.l %R2,%R0";
-})
-
 ;; Prevent AND from being made with sp.  This doesn't exist in the machine
 ;; and reload will cause inefficient code.  Since sp is a FIXED_REG, we
 ;; can't allocate pseudos into it.
@@ -3780,76 +3714,6 @@
 return "or%.w %1,%0";
 })
 
-;; "iordi3" is mainly here to help combine().
-(define_insn "iordi3"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=o,d")
-   (ior:DI (match_operand:DI 1 "general_operand" "%0,0")
-   (match_operand:DI 2 "general_operand" "dn,don")))]
-  "!TARGET_COLDFIRE"
-{
-  CC_STATUS_INIT;
-  /* We can get CONST_DOUBLE, but also const1_rtx etc.  */
-  if (CONSTANT_P (operands[2]))
-{
-  rtx hi, lo;
-
-  split_double (operands[2], &hi, &lo);
-
-  switch (INTVAL (hi))
-   {
- case 0 :
-   break;
- case -1 :
-   /* FIXME : a scratch register would be welcome here if operand[0]
-  is not a register */
-   output_asm_insn ("move%.l #-1,%0", operands);
-   break;
- default :
-   {
-   rtx xoperands[3];
-
-   xoperands[0] = operands[0];
-   xoperands[2] = hi;
-   output_asm_insn (output_iorsi3 (xoperands), xoperands);
-   }
-   }
-  if (GET_CODE (operands[0]) == REG)
-   operands[0] = gen_rtx_REG (SImode, REGNO (operands[0]) + 1);
-  else
-   operands[0] = adjust_address (operands[0], SImode, 4);
-  switch (INTVAL (lo))
-   {
- case 0 :
-   break;
- case -1 :
-   /* FIXME : a scratch register would be welcome here if operand[0]
-  is not a register */
-   output_asm_insn ("move%.l #-1,%0", operands);
-   break;
- default :
-   {
-   rtx xoperands[3];
-
-   xoperands[0] = operands[0];
-   xoperands[2] = lo;
-   output_asm_insn (output_iorsi3 (xoperands), xoperands);
-   }
-   }
-  return "";
-}
-  if (GET_CODE (operands[0]) != REG)
-{
-  operands[1] = adjust_address (operands[0], SImode, 4);
-  return "or%.l %2,%0\;or%.l %R2,%1";
-}
-  if (GET_CODE (operands[2]) != REG)
-{
-  operands[1] = adjust_address (operands[2], SImode, 4);
-  return "or%.l %2,%0\;or%.l %1,%R0";
-}
-  return "or%.l %2,%0\;or%.l %R2,%R0";
-})
-
 (define_expand "iorsi3"
   [(set (match_operand:SI 0 "nonimmediate_operand" "")
(ior:SI (match_operand:SI 1 "general_operand" "")
@@ -3957,79 +3821,6 @@
 
 ;; xor instructions
 
-;; "xordi3" is mainly here to help combine().
-(define_insn "xordi3"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=od")
-   (xor:DI (match_operand:DI 1

Re: *ping* [patch, libfortran] Fix PR 54736, memory corruption with GFORTRAN_CONVERT_UNIT

2012-10-06 Thread Steven Bosscher
On Sat, Oct 6, 2012 at 1:31 PM, Thomas Koenig  wrote:
> Am 01.10.2012 20:34, schrieb Thomas Koenig:
>>
>> Hello world,
>>
>> the previous version of the patch has an issue that Shane pointed
>> out in the PR.  This version should work; at least it survived
>> all the test cases I could come up with.
>>
>> Regression-tested (again).  OK for trunk?  Also for 4.6 and 4.7?
>
>
> Ping?
>
> I would like to start committing patches so I don't have too many
> of them in my tree at the same time :-)

This looks OK to me.

Ciao!
Steven


Re: [Patch, Fortran, F08] PR 45521: GENERIC resolution with ALLOCATABLE/POINTER and PROCEDURE

2012-10-06 Thread Janus Weil
Hi,

>> the attached patch implements an F08 feature, which allows to
>> distinguish two specific procedures in a generic interface, based on
>> the POINTER and ALLOCATABLE attribute of their arguments.
>>
>> In addition to this, the patch fixes a bug in rejecting data actual
>> arguments passed to procedure formal arguments.
>>
>> The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk?
>>
> Looks good, yes. Thanks.

thanks, Mikael. Committed as r192157.

Cheers,
Janus


Re: RFA: Simplifying truncation and integer lowpart subregs

2012-10-06 Thread Richard Sandiford
Thanks for the review.

Eric Botcazou  writes:
>> Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf.  Also tested
>> by making sure that there were no code differences for a set of gcc .ii
>> files on gcc20 (-O2 -march=native).  OK to install?
>
> Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is 
> always correct?

We only use TRUNCATEs when truncating the operands of an existing operation.
We don't convert the SUBREG itself to a TRUNCATE.  I don't think it matters
whether the narrowing of the operation was brought about by a TRUNCATE
or by a SUBREG.

I think modelling it as a TRUNCATE operation is correct for
!TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
And we shouldn't generate an actual TRUNCATE rtx for
TRULY_NOOP_TRUNCATION (the thing about making
simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
for those targets).  I suppose:

  /* We can't handle truncation to a partial integer mode here
 because we don't know the real bitsize of the partial
 integer mode.  */
  if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
break;

might be a problem though; we should still allow a subreg to be
generated.  Is that what you were thinking of, or something else?

I'm certainly open to other ideas though.  Or do you think that I was
wrong about doing this narrowing in simplify_subreg to begin with?

>> gcc/
>>  * machmode.h (GET_MODE_UNIT_PRECISION): New macro.
>>  * simplify-rtx.c (simplify_truncation): New function.
>
> You should say where it comes from.

OK.

>>  (simplify_unary_operation_1): Use it.  Remove sign bit test
>>  for !TRULY_NOOP_TRUNCATION_MODES_P.
>
>   (simplify_unary_operation_1) : ...
>
>>  (simplify_subreg): Use simplify_int_lowpart for TRUNCATE.
>
> This function doesn't exist.  And this is misleading, it's not just for 
> TRUNCATE, it's for a truncation to the lowpart.

Curses, forgot to update that part.  Thanks.

>>  /* Try to simplify a unary operation CODE whose output mode is to be
>> MODE with input operand OP whose mode was originally OP_MODE.
>> Return zero if no simplification can be made.  */
>> @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod
>>  op_mode = mode;
>>in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode);
>> 
>> -  if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT)
>> -{
>> -  rtx tem = in2;
>> -  in2 = in1; in1 = tem;
>> -}
>> -
>>return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR,
>>   mode, in1, in2);
>>  }
>
> Why is this hunk here?

Sorry, I've no idea :-(

>> @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer
>>return NULL_RTX;
>>  }
>> 
>> -  /* Merge implicit and explicit truncations.  */
>> -
>> -  if (GET_CODE (op) == TRUNCATE
>> -  && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
>> -  && subreg_lowpart_offset (outermode, innermode) == byte)
>> -return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
>> -   GET_MODE (XEXP (op, 0)));
>> -
>>/* SUBREG of a hard register => just change the register number
>>   and/or mode.  If the hard register is not valid in that mode,
>>   suppress this simplification.  If the hard register is the stack,
>
> Likewise.

This moved to simplify_truncation:

  /* (truncate:A (truncate:B X)) is (truncate:A X).  */
  if (GET_CODE (op) == TRUNCATE)
return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0),
   GET_MODE (XEXP (op, 0)));

I haven't included an updated patch because of your general concern above.

Richard


Re: RFA: Simplifying truncation and integer lowpart subregs

2012-10-06 Thread Eric Botcazou
> I think modelling it as a TRUNCATE operation is correct for
> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
> And we shouldn't generate an actual TRUNCATE rtx for
> TRULY_NOOP_TRUNCATION (the thing about making
> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
> for those targets).  I suppose:
> 
>   /* We can't handle truncation to a partial integer mode here
>  because we don't know the real bitsize of the partial
>  integer mode.  */
>   if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
> break;
> 
> might be a problem though; we should still allow a subreg to be
> generated.  Is that what you were thinking of, or something else?

I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations 
aren't equivalent.  Generating TRUNCATE in simplify_subreg seems suspicious to 
me in this case but, if not doing it is the source of the bug, I guess I need 
to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-)

Maybe add a blurb to the head comment of simplify_truncation, explaining that
it is valid to call the function both for TRUNCATEs and truncations to the 
lowpart, and why it is correct to generate new TRUNCATEs in the latter case.

-- 
Eric Botcazou


Re: [SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing

2012-10-06 Thread Oleg Endo
On Sat, 2012-10-06 at 12:31 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > The attached patch is the next step that adds the thread pointer
> > builtins.  The GBR address mode stuff will follow afterwards separately.
> > Tested on rev 192142 with 'make all' and
> > 'make -k check-gcc RUNTESTFLAGS="sh.exp --target_board=sh-sim
> > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"'
> 
> The patch is OK.

Installed.
Attached patch adds the GBR addressing mode stuff.  I've added the 
'!NONJUMP_INSN_P (i)' check which should fix the crash.  Could you
please test again?

Tested on rev 192154 with 'make all' and
make -k check-gcc RUNTESTFLAGS="sh.exp=pr54760* --target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Cheers,
Oleg

gcc/ChangeLog:

PR target/54760
* config/sh/sh.md (*mov_gbr_load, *mov_gbr_store):
New insns and accompanying unnamed splits.
* config/sh/predicates.md (general_movsrc_operand,
general_movdst_operand): Reject GBR addresses.
* config/sh/sh-protos.h (sh_find_equiv_gbr_addr): New
declaration.
* config/sh/sh.c (sh_address_cost, sh_legitimate_address_p, 
sh_secondary_reload): Handle GBR addresses.
(base_reg_disp): New class.
(sh_find_base_reg_disp, sh_find_equiv_gbr_addr): New functions.

testsuite/ChangeLog:

PR target/54760
* gcc.target/sh/pr54760-2.c: New.
* gcc.target/sh/pr54760-3.c: New.

Index: gcc/testsuite/gcc.target/sh/pr54760-2.c
===
--- gcc/testsuite/gcc.target/sh/pr54760-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr54760-2.c	(revision 0)
@@ -0,0 +1,223 @@
+/* Check that thread pointer relative memory accesses are converted to
+   gbr displacement address modes.  If we see a gbr register store
+   instruction something is not working properly.  */
+/* { dg-do compile { target "sh*-*-*" } } */
+/* { dg-options "-O1" } */
+/* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } }  */
+/* { dg-final { scan-assembler-times "stc\tgbr" 0 } } */
+
+/* ---
+  Simple GBR load.
+*/
+#define func(name, type, disp)\
+  int \
+  name ## _tp_load (void) \
+  { \
+type* tp = (type*)__builtin_thread_pointer (); \
+return tp[disp]; \
+  }
+
+func (test00, int, 0)
+func (test01, int, 5)
+func (test02, int, 255)
+
+func (test03, short, 0)
+func (test04, short, 5)
+func (test05, short, 255)
+
+func (test06, char, 0)
+func (test07, char, 5)
+func (test08, char, 255)
+
+func (test09, unsigned int, 0)
+func (test10, unsigned int, 5)
+func (test11, unsigned int, 255)
+
+func (test12, unsigned short, 0)
+func (test13, unsigned short, 5)
+func (test14, unsigned short, 255)
+
+func (test15, unsigned char, 0)
+func (test16, unsigned char, 5)
+func (test17, unsigned char, 255)
+
+#undef func
+
+/* ---
+  Simple GBR store.
+*/
+#define func(name, type, disp)\
+  void \
+  name ## _tp_store (int a) \
+  { \
+type* tp = (type*)__builtin_thread_pointer (); \
+tp[disp] = (type)a; \
+  }
+
+func (test00, int, 0)
+func (test01, int, 5)
+func (test02, int, 255)
+
+func (test03, short, 0)
+func (test04, short, 5)
+func (test05, short, 255)
+
+func (test06, char, 0)
+func (test07, char, 5)
+func (test08, char, 255)
+
+func (test09, unsigned int, 0)
+func (test10, unsigned int, 5)
+func (test11, unsigned int, 255)
+
+func (test12, unsigned short, 0)
+func (test13, unsigned short, 5)
+func (test14, unsigned short, 255)
+
+func (test15, unsigned char, 0)
+func (test16, unsigned char, 5)
+func (test17, unsigned char, 255)
+
+#undef func
+
+/* ---
+  Arithmetic on the result of a GBR load.
+*/
+#define func(name, type, disp, op, opname)\
+  int \
+  name ## _tp_load_arith_ ##opname (int a) \
+  { \
+type* tp = (type*)__builtin_thread_pointer (); \
+return tp[disp] op a; \
+  }
+
+#define funcs(op, opname) \
+  func (test00, int, 0, op, opname) \
+  func (test01, int, 5, op, opname) \
+  func (test02, int, 255, op, opname) \
+  func (test03, short, 0, op, opname) \
+  func (test04, short, 5, op, opname) \
+  func (test05, short, 255, op, opname) \
+  func (test06, char, 0, op, opname) \
+  func (test07, char, 5, op, opname) \
+  func (test08, char, 255, op, opname) \
+  func (test09, unsigned int, 0, op, opname) \
+  func (test10, unsigned int, 5, op, opname) \
+  func (test11, unsigned int, 255, op, opname) \
+  func (test12, unsigned short, 0, op, opname) \
+  func (test13, unsigned short, 5, op, opname) \
+  func (test14, unsigned short, 255, op, opname) \
+  func (test15, unsigned char, 0, op, opname) \
+  func (test16, unsigned char, 5, op, opname) \
+  func (test17, unsigned char, 255, op, opname) \
+
+funcs (+, plus)
+funcs (-, minus)
+funcs (*, mul)
+funcs (&,

Re: [C++ Patch] PR 52764

2012-10-06 Thread Jason Merrill

OK.

Jason


Re: [C++ Patch] PR 54249

2012-10-06 Thread Jason Merrill

OK.

Jason


Re: [C++ Patch / RFC] PR 51422

2012-10-06 Thread Jason Merrill

On 09/27/2012 07:08 AM, Paolo Carlini wrote:

Then checking error_operand_p (decl) in is_capture_proxy solves the
problem but now the question is: do we have reasons to believe that such
VAR_DECLs should never ever reach is_normal_capture_proxy?


That depends on our error recovery strategy for an invalid capture.  It 
seems that we currently build a VAR_DECL that has an error_mark_node 
DECL_VALUE_EXPR; if we're going to do that, we need to deal with it.  I 
guess we should return true in that case, if it doesn't cause more 
problems later on.


Jason



[Patch, Fortran, committed] PR 54832

2012-10-06 Thread Janus Weil
> This is now:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54832

... which I have just fixed by an (obvious and regtested) patch:

http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192160

Thanks again for reporting this,
Janus



> 2012/10/6 Rouson, Damian :
>> The Gfortran 4.8.0 20120930 ICE shown below appears to stem from the
>> "result" keyword used in the first interface body.  The code works fine
>> with gfortran 4.7.2.  This is a simplified version of code from my book:
>>
>> $ cat integrand.F90
>> module integrand_module
>>
>>   type ,abstract :: integrand
>>   contains
>> procedure(t_interface), deferred :: t
>> procedure(assign_interface), deferred :: assign
>> procedure(times_interface), deferred :: times
>> generic :: operator(*) => times
>> generic :: assignment(=) => assign
>>   end type
>>
>>   abstract interface
>> function t_interface(this) result(dState_dt)
>>   import :: integrand
>>   class(integrand) ,intent(in)  :: this
>>   class(integrand) ,allocatable :: dState_dt
>> end function
>> function times_interface(lhs,rhs)
>>   import :: integrand
>>   class(integrand) ,intent(in)  :: lhs
>>   class(integrand) ,allocatable :: times_interface
>>   real, intent(in)  :: rhs
>> end function
>> subroutine assign_interface(lhs,rhs)
>>   import :: integrand
>>   class(integrand) ,intent(in):: rhs
>>   class(integrand) ,intent(inout) :: lhs
>> end subroutine
>>   end interface
>>
>> contains
>>   subroutine integrate(model,dt)
>> class(integrand) :: model
>> real dt
>> model = model%t()*dt
>>end subroutine
>> end module
>> $ gfortran -c integrand.F90
>> integrand.F90: In function 'integrate':
>> integrand.F90:35:0: internal compiler error: in gfc_conv_expr, at
>> fortran/trans-expr.c:5862
>>  model = model%t()*dt
>>  ^
>>
>> integrand.F90:35:0: internal compiler error: Abort trap: 6
>> gfortran: internal compiler error: Abort trap: 6 (program f951)
>> Abort trap: 6
>> remote-ca41:gnu-regression rouson$ gfortran --version
>> GNU Fortran (MacPorts gcc48 4.8-20120930_1) 4.8.0 20120930 (experimental)
>> Copyright (C) 2012 Free Software Foundation, Inc.
>>
>>
>>


Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-06 Thread Jan Hubicka
> > Index: passes.c
> > +/* Hold statistic about profile consistency.  */
> ...
> 
> I don't see why this should live in passes.c, can you please put it in
> a more logical place (profile.c, perhaps)?

Hmm, the problem here is that the code is using passmanager's dumping bits
to order the passes and assign them names.  So moving it elsewhere requires
exporting it that is not nice.  passes.c does similar stuff already, so I 
decided
to keep it there.

Here is patch I comitted that prints more detailed report.  It reports also 
changes
in overall unit time/size estimates and when pass did nothing it reports it as
uneffective.  For tramp3d it looks this way:

Pass name|mismatch in |mismated out|Overall
 |freq count  |freq count  |size   time
cfg  (after TODO)|||  -1.5474%
ssa  |||  -2.0148%
inline_param |||
einline  |||  -0.4991%
einline  (after TODO)|||  -0.0129%
early_optimizations  |||
copyrename   |||
ccp  |||  -0.2273%
forwprop |||  -0.0688%
ealias   |||
esra |||  -0.1892%
fre  |||  -7.9369%
copyprop (after TODO)|||  -0.0187%
mergephi |||
cddce|||  -0.1655%
eipa_sra |||  -0.0237%
tailr|||  -0.5305%
switchconv   |||  +0.0190%
profile_estimate |+1  ||
local-pure-const |||
fnsplit  |   +20  ||  +0.2333%  +0.8267%
fnsplit  (after TODO)|   -20  ||  -1.8146%  -1.3300%
release_ssa  |||
inline_param |||
inline   |  +229  || +14.5138% -10.6020%
inline   (after TODO)|  -225  ||  -0.2662% -10.2195%
copyrename   |||  -2.7026%  -2.9448%
cunrolli |||
ccp  |||  -0.0174%  -0.0142%
ccp  (after TODO)|+8  ||  -0.3571%  -0.1213%
forwprop |||  -0.1442%  -0.5343%
alias|||
retslot  |||
phiprop  |||
fre  |||  -0.2801%  -0.2814%
fre  (after TODO)|||  -0.2897%  -0.0362%
copyprop |||  -0.0264%  -0.0251%
mergephi |||
vrp  |   +80  ||  +0.1233%  -1.0271%
vrp  (after TODO)|+3  ||  -0.8618%  -0.2421%
dce  |||  -0.0089%  -0.0021%
dce  (after TODO)|-7  ||  -0.0089%  -0.0347%
cdce |||
cselim   |||  -0.0177%  +0.%
ifcombine|||  +0.0044%  +0.0001%
ifcombine(after TODO)|||  -0.0089%  +0.%
phiopt   |||  -0.1730%  -0.1203%
tailr|||
ch   |+2  ||  +2.0535%  +0.%
ch   (after TODO)|||+0.0002%
cplxlower|||
sra  |||  +0.0087%
copyrename   |||
dom  |   +86  ||  +1.1802%  -0.1366%
dom  (after TODO)|+5  ||  -0.1894%  -0.3042%
phicprop |||  -0.0043%
phicprop (after TODO)|-6  ||+0.2117%
dse  |||  -0.0086%  -0.0009%
re

Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-06 Thread Steven Bosscher
On Sat, Oct 6, 2012 at 4:10 PM, Jan Hubicka  wrote:
>> > Index: passes.c
>> > +/* Hold statistic about profile consistency.  */
>> ...
>>
>> I don't see why this should live in passes.c, can you please put it in
>> a more logical place (profile.c, perhaps)?
>
> Hmm, the problem here is that the code is using passmanager's dumping bits
> to order the passes and assign them names.  So moving it elsewhere requires
> exporting it that is not nice.  passes.c does similar stuff already, so I 
> decided
> to keep it there.

I think this is not the right decision. We can also throw _all_ code
on one pile because there are inter-dependencies. Or they can be fixed
with a proper interface.

> If there are no complains I will commit the patch tomorrow.

+1 complaint.
You're putting profile stuff and even RTL stuff in the pass manager.
That is Just Wrong.

Ciao!
Steven


[patch] fix libbacktrace build failure on arm-linux

2012-10-06 Thread Matthias Klose
current trunk fails to build on arm-linux with:

In file included from ../../../../src/libbacktrace/backtrace.c:35:0:
../libgcc/unwind.h: In function '_Unwind_decode_typeinfo_ptr':
../libgcc/unwind.h:42:45: error: unused parameter 'base' 
[-Werror=unused-parameter]
   _Unwind_decode_typeinfo_ptr (_Unwind_Word base, _Unwind_Word ptr)
 ^
../libgcc/unwind.h: In function '__gnu_unwind_24bit':
../libgcc/unwind.h:68:41: error: unused parameter 'context'
[-Werror=unused-parameter]
   __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
 ^
../libgcc/unwind.h:68:54: error: unused parameter 'data' 
[-Werror=unused-parameter]
   __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
  ^
../libgcc/unwind.h:68:64: error: unused parameter 'compact'
[-Werror=unused-parameter]
   __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
^
cc1: all warnings being treated as errors
make[8]: *** [backtrace.lo] Error 1

the immediate fix is to mark all arguments as unused, however I don't know if
this function should be used by libbacktrace, if it returns _URC_FAILURE
unconditionally.

* config/arm/unwind-arm.h (__gnu_unwind_24bit): Mark parameters
as unused.


--- libgcc/config/arm/unwind-arm.h  (revision 192162)
+++ libgcc/config/arm/unwind-arm.h  (working copy)
@@ -64,8 +64,11 @@
   return tmp;
 }

+#define __unused __attribute__((unused))
+
   static inline _Unwind_Reason_Code
-  __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
+  __gnu_unwind_24bit (_Unwind_Context * context __unused, _uw data __unused,
+  int compact __unused)
 {
   return _URC_FAILURE;
 }


Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-06 Thread Graham Stott

Jan.

This patch also breaks bootstrap due compilation errors reported for 
pases.c and toplev.c


Graham


Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-06 Thread Jan Hubicka
> Jan.
> 
> This patch also breaks bootstrap due compilation errors reported for
> pases.c and toplev.c

Sorry for that.  I swapped files and used old version of the patch. It should
be fixed now.

Honza
> 
> Graham


Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-06 Thread Steven Bosscher
On Sat, Oct 6, 2012 at 4:39 PM, Steven Bosscher  wrote:

>> If there are no complains I will commit the patch tomorrow.
>
> +1 complaint.
> You're putting profile stuff and even RTL stuff in the pass manager.
> That is Just Wrong.

You already committed the patch. Your tomorrow started early? ;-)

Ciao!
Steven


Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-06 Thread Jan Hubicka
> On Sat, Oct 6, 2012 at 4:10 PM, Jan Hubicka  wrote:
> >> > Index: passes.c
> >> > +/* Hold statistic about profile consistency.  */
> >> ...
> >>
> >> I don't see why this should live in passes.c, can you please put it in
> >> a more logical place (profile.c, perhaps)?
> >
> > Hmm, the problem here is that the code is using passmanager's dumping bits
> > to order the passes and assign them names.  So moving it elsewhere requires
> > exporting it that is not nice.  passes.c does similar stuff already, so I 
> > decided
> > to keep it there.
> 
> I think this is not the right decision. We can also throw _all_ code
> on one pile because there are inter-dependencies. Or they can be fixed
> with a proper interface.
> 
> > If there are no complains I will commit the patch tomorrow.
> 
> +1 complaint.
> You're putting profile stuff and even RTL stuff in the pass manager.
> That is Just Wrong.

Hmm, the code really is about collecting statistics of individual passes.
I do not think it is that different from rest of TODO handling logic, but
I do not care much. It is not supposed to be something that will grow over
the time and become essential part of the compiler.

The code uses the CFG/Gimple/RTL interfaces that are both used by many parts of
the compiler (including the passmanager) + passmanager's passes_by_id and
passes_by_id_size that are not exported.

profile.c is not good place because that is only about reading profile
feedback.  cfg.c or predict.c seems better because that is where most of
cfg/profile API lives.  I will probably move it to cfg.c, because it should be
in sync with the BB dumping code that should report the same inconsistencies as
the statistics loop.

Would you preffer exporting profile_record structure and putting the actual
walk of CFG into cfg.c or exporting the passmanager's passes_by_id used by the
dumping? (actually passes_by_id misses "static" in the declaratoin, so it is
matter of putting it to the header)  I will implement one of those.

Honza
> 
> Ciao!
> Steven


Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-06 Thread Jan Hubicka
> On Sat, Oct 6, 2012 at 4:39 PM, Steven Bosscher  wrote:
> 
> >> If there are no complains I will commit the patch tomorrow.
> >
> > +1 complaint.
> > You're putting profile stuff and even RTL stuff in the pass manager.
> > That is Just Wrong.
> 
> You already committed the patch. Your tomorrow started early? ;-)

The tomorrow you cite is from Thursday :)

Honza


Re: patch to fix constant math - third small patch

2012-10-06 Thread Kenneth Zadeck

This is the third patch in the series of patches to fix constant math.
this one changes some predicates at the rtl level to use the new 
predicate CONST_SCALAR_INT_P.

I did not include a few that were tightly intertwined with other changes.

Not all of these changes are strictly mechanical.   Richard, when 
reviewing this had me make additional changes to remove what he thought 
were latent bugs at the rtl level.   However, it appears that the bugs 
were not latent.I do not know what is going on here but i am smart 
enough to not look a gift horse in the mouth.


All of this was done on the same machine with no changes and identical 
configs.  It is an x86-64 with ubuntu 12-4.


ok for commit?

in the logs below, gbBaseline is a trunk from friday and the gbWide is 
the same revision but with my patches.  Some of this like 
gfortran.dg/pr32627 is obviously flutter, but the rest does not appear 
to be.


=
heracles:~/gcc(13) gccBaseline/contrib/compare_tests 
gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log

New tests that PASS:

gcc.dg/builtins-85.c scan-assembler mysnprintf
gcc.dg/builtins-85.c scan-assembler-not __chk_fail
gcc.dg/builtins-85.c (test for excess errors)


heracles:~/gcc(14) gccBaseline/contrib/compare_tests 
gbBaseline/gcc/testsuite/gfortran/gfortran.log 
gbWide/gcc/testsuite/gfortran/gfortran.log

New tests that PASS:

gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-loops (test 
for excess errors)

gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer  (test for excess errors)
gfortran.dg/pr32627.f03  -Os  (test for excess errors)
gfortran.dg/pr32635.f  -O0  execution test
gfortran.dg/pr32635.f  -O0  (test for excess errors)
gfortran.dg/substr_6.f90  -O2  (test for excess errors)

Old tests that passed, that have disappeared: (Eeek!)

gfortran.dg/pr32627.f03  -O1  (test for excess errors)
gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (test for excess errors)

gfortran.dg/pr32627.f03  -O3 -g  (test for excess errors)
gfortran.dg/substring_equivalence.f90  -O  (test for excess errors)
Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as 
tool-and-target-specific interface file.


=== g++ Summary ===

# of expected passes49793
# of expected failures284
# of unsupported tests601

runtest completed at Fri Oct  5 16:10:20 2012
heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using 
/usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as 
tool-and-target-specific interface file.


=== g++ Summary ===

# of expected passes50472
# of expected failures284
# of unsupported tests613

runtest completed at Fri Oct  5 19:51:50 2012







Re: Profile housekeeping 6/n (-fprofile-consistency-report)

2012-10-06 Thread Jan Hubicka
Hi,
does this look better? Moving to cfg.c would importing tree-pass.h and rtl.h
that is not cool either. predict.c does all of these.
Obviously can also go to a separate file, if preferred.

If it looks fine to you, I will commit it after testing.

Honza

Index: tree-pass.h
===
*** tree-pass.h (revision 192158)
--- tree-pass.h (working copy)
*** extern void do_per_function_toporder (vo
*** 547,551 
--- 547,553 
  extern void disable_pass (const char *);
  extern void enable_pass (const char *);
  extern void dump_passes (void);
+ extern struct opt_pass **passes_by_id;
+ extern int passes_by_id_size;
  
  #endif /* GCC_TREE_PASS_H */
Index: predict.c
===
*** predict.c   (revision 192158)
--- predict.c   (working copy)
*** rebuild_frequencies (void)
*** 2799,2801 
--- 2799,3005 
  gcc_unreachable ();
timevar_pop (TV_REBUILD_FREQUENCIES);
  }
+ 
+ /* Make statistic about profile consistency.  */
+ 
+ struct profile_record
+ {
+   int num_mismatched_freq_in[2];
+   int num_mismatched_freq_out[2];
+   int num_mismatched_count_in[2];
+   int num_mismatched_count_out[2];
+   bool run;
+   gcov_type time[2];
+   int size[2];
+ };
+ 
+ static struct profile_record *profile_record;
+ 
+ static void
+ check_profile_consistency (int index, int subpass, bool run)
+ {
+   basic_block bb;
+   edge_iterator ei;
+   edge e;
+   int sum;
+   gcov_type lsum;
+ 
+   if (index == -1)
+ return;
+   if (!profile_record)
+ profile_record = XCNEWVEC (struct profile_record,
+  passes_by_id_size);
+   gcc_assert (index < passes_by_id_size && index >= 0);
+   gcc_assert (subpass < 2);
+   profile_record[index].run |= run;
+ 
+   FOR_ALL_BB (bb)
+{
+   if (bb != EXIT_BLOCK_PTR_FOR_FUNCTION (cfun)
+ && profile_status != PROFILE_ABSENT)
+   {
+ sum = 0;
+ FOR_EACH_EDGE (e, ei, bb->succs)
+   sum += e->probability;
+ if (EDGE_COUNT (bb->succs) && abs (sum - REG_BR_PROB_BASE) > 100)
+   profile_record[index].num_mismatched_freq_out[subpass]++;
+ lsum = 0;
+ FOR_EACH_EDGE (e, ei, bb->succs)
+   lsum += e->count;
+ if (EDGE_COUNT (bb->succs)
+ && (lsum - bb->count > 100 || lsum - bb->count < -100))
+   profile_record[index].num_mismatched_count_out[subpass]++;
+   }
+   if (bb != ENTRY_BLOCK_PTR_FOR_FUNCTION (cfun)
+ && profile_status != PROFILE_ABSENT)
+   {
+ sum = 0;
+ FOR_EACH_EDGE (e, ei, bb->preds)
+   sum += EDGE_FREQUENCY (e);
+ if (abs (sum - bb->frequency) > 100
+ || (MAX (sum, bb->frequency) > 10
+ && abs ((sum - bb->frequency) * 100 / (MAX (sum, 
bb->frequency) + 1)) > 10))
+   profile_record[index].num_mismatched_freq_in[subpass]++;
+ lsum = 0;
+ FOR_EACH_EDGE (e, ei, bb->preds)
+   lsum += e->count;
+ if (lsum - bb->count > 100 || lsum - bb->count < -100)
+   profile_record[index].num_mismatched_count_in[subpass]++;
+   }
+   if (bb == ENTRY_BLOCK_PTR_FOR_FUNCTION (cfun)
+ || bb == EXIT_BLOCK_PTR_FOR_FUNCTION (cfun))
+   continue;
+   if ((cfun && (cfun->curr_properties & PROP_trees)))
+   {
+ gimple_stmt_iterator i;
+ 
+ for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next (&i))
+   {
+ profile_record[index].size[subpass]
++= estimate_num_insns (gsi_stmt (i), &eni_size_weights);
+ if (profile_status == PROFILE_READ)
+   profile_record[index].time[subpass]
+  += estimate_num_insns (gsi_stmt (i),
+ &eni_time_weights) * bb->count;
+ else if (profile_status == PROFILE_GUESSED)
+   profile_record[index].time[subpass]
+  += estimate_num_insns (gsi_stmt (i),
+ &eni_time_weights) * bb->frequency;
+   }
+   }
+   else if (cfun && (cfun->curr_properties & PROP_rtl))
+   {
+ rtx insn;
+ for (insn = NEXT_INSN (BB_HEAD (bb)); insn && insn != NEXT_INSN 
(BB_END (bb));
+  insn = NEXT_INSN (insn))
+   if (INSN_P (insn))
+ {
+   profile_record[index].size[subpass]
+  += insn_rtx_cost (PATTERN (insn), false);
+   if (profile_status == PROFILE_READ)
+ profile_record[index].time[subpass]
++= insn_rtx_cost (PATTERN (insn), true) * bb->count;
+   else if (profile_status == PROFILE_GUESSED)
+ profile_record[index].time[subpass]
++= insn_rtx_cost (PATTERN (insn), true) * bb->frequency;
+ }
+   }
+}
+ }
+ 
+ /* Output profile consistency.  */
+ 
+ void
+ dump_pro

Re: [wwwdocs] SH 4.8 changes update

2012-10-06 Thread Gerald Pfeifer
Hi Oleg,

have you considered also documenting the new __builtin_thread_pointer
and __builtin_set_thread_pointer built-ins?  (I just noticed this by
chance.)

On Thu, 4 Oct 2012, Oleg Endo wrote:
> The atomic options of SH have been changed recently.  The attached patch 
> updates the 4.8 changes.html accordingly, plus some minor wording fixes.

Looking at the list of parameters to -matomic-model=, it appears
that a definition list () is better suitable than a regular
list, so I went ahead and made the change below.

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.37
diff -u -3 -p -r1.37 changes.html
--- changes.html5 Oct 2012 19:30:26 -   1.37
+++ changes.html6 Oct 2012 15:32:05 -
@@ -258,30 +258,30 @@ by this change.
   A new option -matomic-model=model selects the
   model for the generated atomic sequences.  The following models are
   supported:
-  
-soft-gusa
+  
+soft-gusa
 Software gUSA sequences (SH3* and SH4* only).  On SH4A targets this
 will now also partially utilize the movco.l and
 movli.l instructions.  This is the default when the target
-is sh3*-*-linux* or sh4*-*-linux*.
+is sh3*-*-linux* or sh4*-*-linux*.
 
-hard-llcs
+hard-llcs
 Hardware movco.l / movli.l sequences
-(SH4A only).
+(SH4A only).
 
-soft-tcb
-Software thread control block sequences.
+soft-tcb
+Software thread control block sequences.
 
-soft-imask
-Software interrupt flipping sequences (privileged mode only).  This is
-the default when the target is sh1*-*-linux* or
-sh2*-*-linux*.
+soft-imask
+Software interrupt flipping sequences (privileged mode only).  This
+is the default when the target is sh1*-*-linux* or
+sh2*-*-linux*.
 
-none
+none
 Generates function calls to the respective __atomic
-built-in functions.  This is the default for SH64 targets or when the
-target is not sh*-*-linux*.
-  
+built-in functions.  This is the default for SH64 targets or when
+the target is not sh*-*-linux*.
+  
 
   The option -msoft-atomic has been deprecated.  It is
   now an alias for -matomic-model=soft-gusa.


Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-06 Thread Dehao Chen
ping^2

Honza, do you think this patch can make into 4.8 stage 1?

Thanks,
Dehao

On Wed, Sep 26, 2012 at 2:34 PM, Dehao Chen  wrote:
> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01975.html
>
> Thanks,
> Dehao


Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-06 Thread Jan Hubicka
> ping^2
> 
> Honza, do you think this patch can make into 4.8 stage 1?

+  if (check_value_one ^ integer_onep (val))

Probably better as != 
(especially because GNU coding standard allows predicates to return more than
just boolean)


+{
+  edge e1;
+  edge_iterator ei;
+  tree val = gimple_phi_arg_def (phi_stmt, i);
+  edge e = gimple_phi_arg_edge (phi_stmt, i);
+
+  if (!TREE_CONSTANT (val) || !(integer_zerop (val) || integer_onep (val)))
+   continue;
+  if (check_value_one ^ integer_onep (val))
+   continue;
+  if (VEC_length (edge, e->src->succs) != 1)
+   {
+ if (!predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS_GUESSED)
+ && !predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS)
+ && !predicted_by_p (exit_edge->src, PRED_LOOP_EXIT))
+   predict_edge_def (e, PRED_LOOP_EXIT, NOT_TAKEN);
+ continue;
+   }
+
+  FOR_EACH_EDGE (e1, ei, e->src->preds)
+   if (!predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS_GUESSED)
+   && !predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS)
+   && !predicted_by_p (exit_edge->src, PRED_LOOP_EXIT))
+ predict_edge_def (e1, PRED_LOOP_EXIT, NOT_TAKEN);

Here you found an edge that you know is going to terminate the loop
and you want to predict all paths to this edge as unlikely.
Perhaps you want to use predict paths leading_to_edge for edge?

You do not need to check PRED_LOOP_ITERATIONS and PRED_LOOP_ITERATIONS_GUESSED
because those never go to the non-exit edges. 

The nature of predict_paths_for_bb type heuristic is that they are not really
additive: if the path leads to two different aborts it does not make it more
sure that it will be unlikely.  So perhaps you can add !predicted_by_p (e, pred)
prior predict_edge_def call in the function?

I wonder if we did VRP just before branch predction to jump thread the shortcut
condtions into loopback edges, would be there still cases where this
optimization will match?

Honza


Drop V1 plugin API hack

2012-10-06 Thread Jan Hubicka
Hi,
this patch removes V1 API hack, making us to require V2 plugin api aware linker
to get sane code quality on anything using COMDAT.  Given that the V1 API
linkers are quite old now, I think it is safe.  I added a testcase that should
fail with old linker and will update changes.html.

The patch also cleans up the code for new symbol table APIs.  Nice effect is
that inliner is now more aggressive about inlining comdats and thus tramp3d
scoes are improved with -flto once again.

lto-bootstrapped/regtested x86_64, comitted.

Honza
PR lto/53831
PR lto/54776
* g++.dg/lto/v1-plugin-api-not-supported.C: New testcase.
* lto-streamer-out.c (produce_symtab): Cleanup; drop v1 API hack.

Index: testsuite/g++.dg/lto/v1-plugin-api-not-supported.C
===
--- testsuite/g++.dg/lto/v1-plugin-api-not-supported.C  (revision 0)
+++ testsuite/g++.dg/lto/v1-plugin-api-not-supported.C  (revision 0)
@@ -0,0 +1,59 @@
+// { dg-lto-do run }
+// { dg-require-linker-plugin "" }
+// { dg-lto-options {{-O2 -fuse-linker-plugin -fno-early-inlining}}
+
+extern "C" void abort (void);
+extern "C" void linker_error ();
+
+class A
+{
+public:
+  int data;
+  virtual int foo (int i)
+{
+  return i + 1;
+}
+};
+
+class B : public A
+{
+public:
+  virtual int foo (int i)
+{
+  return i + 2;
+}
+};
+
+class C : public A
+{
+public:
+  virtual int foo (int i)
+{
+  linker_error ();
+  return i + 3;
+}
+};
+
+
+static int middleman (class A *obj, int i)
+{
+  return obj->foo (i);
+}
+
+int __attribute__ ((noinline,noclone)) get_input(void)
+{
+  return 1;
+}
+
+int main (int argc, char *argv[])
+{
+  class B b;
+  if (middleman (&b, get_input ()) != 3)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-ipa-dump "Discovered a virtual call to a known 
target.*B::foo"  "cp"  } } */
+/* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "optimized"} } */
+/* { dg-final { cleanup-ipa-dump "cp" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 192154)
+++ lto-streamer-out.c  (working copy)
@@ -1285,11 +1287,9 @@ produce_symtab (struct output_block *ob)
   struct streamer_tree_cache_d *cache = ob->writer_cache;
   char *section_name = lto_get_section_name (LTO_section_symtab, NULL, NULL);
   struct pointer_set_t *seen;
-  struct cgraph_node *node;
-  struct varpool_node *vnode;
   struct lto_output_stream stream;
   lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
-  int i;
+  lto_symtab_encoder_iterator lsei;
 
   lto_begin_section (section_name, false);
   free (section_name);
@@ -1297,78 +1297,26 @@ produce_symtab (struct output_block *ob)
   seen = pointer_set_create ();
   memset (&stream, 0, sizeof (stream));
 
-  /* Write all functions. 
- First write all defined functions and then write all used functions.
- This is done so only to handle duplicated symbols in cgraph.  */
-  for (i = 0; i < lto_symtab_encoder_size (encoder); i++)
+  /* Write the symbol table.
+ First write everything defined and then all declarations.
+ This is neccesary to handle cases where we have duplicated symbols.  */
+  for (lsei = lsei_start (encoder);
+   !lsei_end_p (lsei); lsei_next (&lsei))
 {
-  if (!symtab_function_p (lto_symtab_encoder_deref (encoder, i)))
-   continue;
-  node = cgraph (lto_symtab_encoder_deref (encoder, i));
-  if (DECL_EXTERNAL (node->symbol.decl))
-   continue;
-  if (DECL_COMDAT (node->symbol.decl)
- && cgraph_comdat_can_be_unshared_p (node))
-   continue;
-  if ((node->alias && !node->thunk.alias) || node->global.inlined_to)
+  symtab_node node = lsei_node (lsei);
+
+  if (!symtab_real_symbol_p (node) || DECL_EXTERNAL (node->symbol.decl))
continue;
   write_symbol (cache, &stream, node->symbol.decl, seen, false);
 }
-  for (i = 0; i < lto_symtab_encoder_size (encoder); i++)
+  for (lsei = lsei_start (encoder);
+   !lsei_end_p (lsei); lsei_next (&lsei))
 {
-  if (!symtab_function_p (lto_symtab_encoder_deref (encoder, i)))
-   continue;
-  node = cgraph (lto_symtab_encoder_deref (encoder, i));
-  if (!DECL_EXTERNAL (node->symbol.decl))
-   continue;
-  /* We keep around unused extern inlines in order to be able to inline
-them indirectly or via vtables.  Do not output them to symbol
-table: they end up being undefined and just consume space.  */
-  if (!node->symbol.address_taken && !node->callers)
-   continue;
-  if (DECL_COMDAT (node->symbol.decl)
- && cgraph_comdat_can_be_unshared_p (node))
-   continue;
-  if ((node->alias && !node->thunk.alias) || node->global.inlined_to)
-   continue;
-  write_symbol (cache, &stream, node->symbol.decl, seen, false);
-}
+  symtab_node node

Re: [wwwdocs] SH 4.8 changes update

2012-10-06 Thread Oleg Endo
On Sat, 2012-10-06 at 17:57 +0200, Gerald Pfeifer wrote:
> Hi Oleg,
> 
> have you considered also documenting the new __builtin_thread_pointer
> and __builtin_set_thread_pointer built-ins?  (I just noticed this by
> chance.)

Yes, sure.  The documentation and www changes updates will follow soon.

> 
> On Thu, 4 Oct 2012, Oleg Endo wrote:
> > The atomic options of SH have been changed recently.  The attached patch 
> > updates the 4.8 changes.html accordingly, plus some minor wording fixes.
> 
> Looking at the list of parameters to -matomic-model=, it appears
> that a definition list () is better suitable than a regular
> list, so I went ahead and made the change below.
> 

Thanks!  Looks way better now.

Cheers,
Oleg



[RFA] Support common C++ declarations inside GTY'd structures

2012-10-06 Thread Diego Novillo
This patch combines the changes from
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg02016.html with other
additions to support C++ inside GTY'd structures.

The main changes wrt Aaron's original patch are:

- Support for function declarations inside classes.

- Support scoping in identifiers.  This does not mean that gengtype
  supports scopes, it just knows that 'Foo::id' is a single entity.

- Explicit non-support for typedef and enum inside class/struct.
  Since gengtype does not really know about scopes, it cannot
  understand these types, but it knows enough to recognize and reject
  them.  GTY'd struct/class that need to typedef their own types
  should use GTY((user)).

- Documentation on what is and is not supported.

There is one check I needed to remove that gave me some trouble.
When a ctor is detected, we have already parsed the name of the
ctor as a type, which is then registered in the list of structures.

We go on to recognize it as a ctor *after* the type has been
registered.  We reject the field in declarator() and it is never
added to the list of fields for the class.

However, when we reach the end of the class, we find that the
type we created while parsing the ctor has line number
information in it (the line where the ctor was) and gengtype
thinks that it is a duplicate structure definition.

I took out this check for two reasons: (a) It is actually
unnecessary because if there were really duplicate definitions of
this structure, the code would not compile, and (b) all the other
alternatives required making the parser much more convoluted and
I'm trying hard not to make gengtype parser too smart.

Tested on x86_64.  OK for trunk?

2012-10-05  Aaron Gray 
Diego Novillo 

* gengtype-lex.l: Support for C++ single line comments.
Support for classes.
(CXX_KEYWORD): New.  Support C++ keywords inline, public,
protected, private, template, operator, friend, &, ~.
(TYPEDEF): New.  Support typedef.
* gengtype-parser.c: updated 'token_names[]'
(direct_declarator): Add support for parsing functions
and ctors.

2012-10-05  Diego Novillo  

* doc/gty.texi: Document C++ limitations in gengtype.
* gengtype-lex.l (CID): Rename from ID.
(ID): Include scoping '::' as part of the identifier name.
* gengtype-parse.c (token_names): Update.
(token_value_format): Update.
(consume_until_eos): Rename from consume_until_semi.
Remove unused argument IMMEDIATE.  Update all callers.
Also consider '}' as a finalizer.
(consume_until_comma_or_eos): Rename from
consume_until_comma_or_semi.
Remove unused argument IMMEDIATE.  Update all callers.
Also consider '}' as a finalizer.
(direct_declarator): Add documentation on ctor support.
Add argument IN_STRUCT.
If the token following ID is a '(', consider ID a
function and return NULL.
If the token following '(' is not a '*', and IN_STRUCT is
true, conclude that this is a ctor and return NULL.
If the token is IGNORABLE_CXX_KEYWORD, return NULL.
(inner_declarator): Add argument IN_STRUCT.
Update all callers.
(declarator): Add argument IN_STRUCT with default value
false.  Update all callers.
(type): Document argument NESTED.
Skip over C++ inheritance specifiers.
If a token TYPEDEF is found, emit an error.
If an enum is found inside a class/structure, emit an
error.
(new_structure): Do not complain about duplicate
structures if S has a line location set.
* gengtype-state.c (write_state_type): Remove default
handler.  Add handler for TYPE_NONE.
(read_state_scalar_char_type):
* gengtype.c: Fix spacing.
* gengtype.h (enum gty_token): Add name.  Add token
IGNORABLE_CXX_KEYWORD.

diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
index b2214b8..ea1a928 100644
--- a/gcc/doc/gty.texi
+++ b/gcc/doc/gty.texi
@@ -65,6 +65,27 @@ The parser understands simple typedefs such as
 @code{typedef int @var{name};}.
 These don't need to be marked.
 
+Since @code{gengtype}'s understanding of C++ is limited, there are
+several constructs and declarations that are not supported inside
+classes/structures marked for automatic GC code generation.  The
+following C++ constructs produce a @code{gengtype} error on
+structures/classes marked for automatic GC code generation:
+
+@itemize @bullet
+@item
+Type definitions inside classes/structures are not supported.
+@item
+Enumerations inside classes/structures are not supported.
+@end itemize
+
+If you have a class or structure using any of the above constructs,
+you need to mark that class as @code{GTY ((user))} and provide your
+own marking routines (see section @ref{User GC} for details).
+
+It is always valid to include function definitions inside classes.
+Those are always ignored by @code{gengtype}

Re: [google] Emit relative addresses to function patch sections instead of absolute addresses. (issue6572065)

2012-10-06 Thread Xinliang David Li
xray feature is not in trunk yet.

David

On Fri, Oct 5, 2012 at 3:53 PM, Diego Novillo  wrote:
> Harshit, why didn't you propose this patch for trunk?  Why should we
> make it a google-local patch?
>
>
> Diego.
>
> On Fri, Sep 28, 2012 at 5:24 AM, Harshit Chopra  wrote:
>> commit fc3a55ccec9bc770c79f8a221f5abd397befc8f6
>> Author: Harshit Chopra 
>> Date:   Thu Sep 20 17:49:59 2012 -0700
>>
>> Instead of emitting absolute addresses to the function patch sections, 
>> emit relative addresses. Absolute addresses might require relocation, which 
>> is time consuming and fraught with other issues.
>>
>> M   gcc/config/i386/i386.c
>>
>> Tested:
>>   Ran make check-gcc and manually confirmed that the affected tests pass.
>>
>> ChangeLog:
>>
>> 2012-09-28  Harshit Chopra  
>>
>> * gcc/config/i386/i386.c 
>> (ix86_output_function_nops_prologue_epilogue): Emit relative address to 
>> function patch sections.
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index f72b0b5..8c9334f 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -11098,7 +11098,7 @@ ix86_output_function_nops_prologue_epilogue (FILE 
>> *file,
>> $LFPEL0:
>>   
>>   0x90 (repeated num_actual_nops times)
>> - .quad $LFPESL0
>> + .quad $LFPESL0 - .
>>   followed by section 'section_name' which contains the address
>>   of instruction at 'label'.
>> */
>> @@ -0,7 +0,10 @@ ix86_output_function_nops_prologue_epilogue (FILE 
>> *file,
>>  asm_fprintf (file, ASM_BYTE"0x90\n");
>>
>>fprintf (file, ASM_QUAD);
>> +  /* Output "section_label - ." for the relative address of the entry in
>> + the section 'section_name'.  */
>>assemble_name_raw (file, section_label);
>> +  fprintf (file, " - .");
>>fprintf (file, "\n");
>>
>>/* Emit the backpointer section. For functions belonging to comdat group,
>> @@ -11144,7 +11147,7 @@ ix86_output_function_nops_prologue_epilogue (FILE 
>> *file,
>>   .quad $LFPEL0
>> */
>>ASM_OUTPUT_INTERNAL_LABEL (file, section_label);
>> -  fprintf(file, ASM_QUAD"\t");
>> +  fprintf(file, ASM_QUAD);
>>assemble_name_raw (file, label);
>>fprintf (file, "\n");
>>
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/6572065


Re: *ping* [patch, libfortran] Fix PR 54736, memory corruption with GFORTRAN_CONVERT_UNIT

2012-10-06 Thread Thomas Koenig

Hi Steven,

Ping?

I would like to start committing patches so I don't have too many
of them in my tree at the same time :-)


This looks OK to me.


Committed as rev. 192158.  I'll commit to 4.7 and after that to 4.6
in a few days.

Thanks a lot for the review!

Thomas



Re: [google] Emit relative addresses to function patch sections instead of absolute addresses. (issue6572065)

2012-10-06 Thread Xinliang David Li
Ok for google branches.

Please consider resend the original xray patch to trunk (gcc-4_8) You
need to make the runtime bits available publicly though.

thanks,

David

On Fri, Sep 28, 2012 at 2:24 AM, Harshit Chopra  wrote:
> commit fc3a55ccec9bc770c79f8a221f5abd397befc8f6
> Author: Harshit Chopra 
> Date:   Thu Sep 20 17:49:59 2012 -0700
>
> Instead of emitting absolute addresses to the function patch sections, 
> emit relative addresses. Absolute addresses might require relocation, which 
> is time consuming and fraught with other issues.
>
> M   gcc/config/i386/i386.c
>
> Tested:
>   Ran make check-gcc and manually confirmed that the affected tests pass.
>
> ChangeLog:
>
> 2012-09-28  Harshit Chopra  
>
> * gcc/config/i386/i386.c 
> (ix86_output_function_nops_prologue_epilogue): Emit relative address to 
> function patch sections.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index f72b0b5..8c9334f 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -11098,7 +11098,7 @@ ix86_output_function_nops_prologue_epilogue (FILE 
> *file,
> $LFPEL0:
>   
>   0x90 (repeated num_actual_nops times)
> - .quad $LFPESL0
> + .quad $LFPESL0 - .
>   followed by section 'section_name' which contains the address
>   of instruction at 'label'.
> */
> @@ -0,7 +0,10 @@ ix86_output_function_nops_prologue_epilogue (FILE 
> *file,
>  asm_fprintf (file, ASM_BYTE"0x90\n");
>
>fprintf (file, ASM_QUAD);
> +  /* Output "section_label - ." for the relative address of the entry in
> + the section 'section_name'.  */
>assemble_name_raw (file, section_label);
> +  fprintf (file, " - .");
>fprintf (file, "\n");
>
>/* Emit the backpointer section. For functions belonging to comdat group,
> @@ -11144,7 +11147,7 @@ ix86_output_function_nops_prologue_epilogue (FILE 
> *file,
>   .quad $LFPEL0
> */
>ASM_OUTPUT_INTERNAL_LABEL (file, section_label);
> -  fprintf(file, ASM_QUAD"\t");
> +  fprintf(file, ASM_QUAD);
>assemble_name_raw (file, label);
>fprintf (file, "\n");
>
>
> --
> This patch is available for review at http://codereview.appspot.com/6572065


handle isl and cloog in contrib/download_prerequisites

2012-10-06 Thread Manuel López-Ibáñez
Hi,

GCC now requires ISL and a very new CLOOG but download_prerequisites
does not download those. Also, there is only one sensible place to
call this script from, so check for that. Also, factorize a bit the
code.

Tested by using it in the the GCC Compile Farm. I tried to build and
install CLOOG and ISL by myself into /opt/cfarm/ but that breaks in
some strange ways. Using the script, bootstrap works just fine.

In some dream universe, configure would detect that the prerequisites
are missing or incorrect very early, give a sensible message and
suggest to download them automatically, and then just do that.

OK for trunk?

Cheers,

Manuel.


downpre.changelog
Description: Binary data


downpre.diff
Description: Binary data


Re: [google] AutoFDO implementation

2012-10-06 Thread Jan Hubicka
> Hi,
> 
> This patch implements the fine-graind AutoFDO optimizations for GCC.
> It uses linux perf to collect sample profiles, and uses debug info to
> represent the profile. In GCC, it uses the profile to annotate CFG to
> drive FDO. This can bring 50% to 110% of the speedup derived by
> traditional instrumentation based FDO. (Average is between 70% to 80%
> for many CPU intensive applications). Comparing with traditional FDO,
> AutoFDO does not require instrumentation. It just need to have an
> optimized binary with debug info to collect the profile.
> 
> This patch has passed bootstrap and gcc regression tests as well as
> tested with crosstool. Okay for google branches?
> 
> If people in up-stream find this feature interesting, I'll spend some
> time to port this to trunk and try to opensource the tool to generate
> profile data file.

I think it is useful feature, yes (and was in my TODO list for quite some
time). Unlike edge profiles, these profiles should be also more independent of
source code/configuration changes.

Just few quick questions from first glance over the patch...
> 
> Dehao
> 
> The patch can also be viewed from:
> 
> http://codereview.appspot.com/6567079
> 
> gcc/ChangeLog.google-4_7:
> 2012-09-28  Dehao Chen  
> 
> * cgraphbuild.c (build_cgraph_edges): Handle AutoFDO profile.
> (rebuild_cgraph_edges): Likewise.
> * cgraph.c (cgraph_clone_node): Likewise.
> (clone_function_name): Likewise.
> * cgraph.h (cgraph_node): New field.
> * tree-pass.h (pass_ipa_auto_profile): New pass.
> * cfghooks.c (make_forwarder_block): Handle AutoFDO profile.
> * ipa-inline-transform.c (clone_inlined_nodes): Likewise.
> * toplev.c (compile_file): Likewise.
> (process_options): Likewise.
> * debug.h (auto_profile_debug_hooks): New.
> * cgraphunit.c (cgraph_finalize_compilation_unit): Handle AutoFDO
> profile.
> (cgraph_copy_node_for_versioning): Likewise.
> * regs.h (REG_FREQ_FROM_BB): Likewise.
> * gcov-io.h: (GCOV_TAG_AFDO_FILE_NAMES): New.
> (GCOV_TAG_AFDO_FUNCTION): New.
> (GCOV_TAG_AFDO_MODULE_GROUPING): New.
> * ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Handle AutoFDO profile.
> * ipa-inline.c (edge_hot_enough_p): Likewise.
> (edge_badness): Likewise.
> (inline_small_functions): Likewise.
> * dwarf2out.c (auto_profile_debug_hooks): New.
> * opts.c (common_handle_option): Handle AutoFDO profile.
> * timevar.def (TV_IPA_AUTOFDO): New.
> * predict.c (compute_function_frequency): Handle AutoFDO profile.
> (rebuild_frequencies): Handle AutoFDO profile.
> * auto-profile.c (struct gcov_callsite_pos): New.
> (struct gcov_callsite): New.
> (struct gcov_stack): New.
> (struct gcov_function): New.
> (struct afdo_bfd_name): New.
> (struct afdo_module): New.
> (afdo_get_filename): New.
> (afdo_get_original_name_size): New.
> (afdo_get_bfd_name): New.
> (afdo_read_bfd_names): New.
> (afdo_stack_hash): New.
> (afdo_stack_eq): New.
> (afdo_function_hash): New.
> (afdo_function_eq): New.
> (afdo_bfd_name_hash): New.
> (afdo_bfd_name_eq): New.
> (afdo_bfd_name_del): New.
> (afdo_module_hash): New.
> (afdo_module_eq): New.
> (afdo_module_num_strings): New.
> (afdo_add_module): New.
> (read_aux_modules): New.
> (get_inline_stack_size_by_stmt): New.
> (get_inline_stack_size_by_edge): New.
> (get_function_name_from_block): New.
> (get_inline_stack_by_stmt): New.
> (get_inline_stack_by_edge): New.
> (afdo_get_function_count): New.
> (afdo_set_current_function_count): New.
> (afdo_add_bfd_name_mapping): New.
> (afdo_add_copy_scale): New.
> (get_stack_count): New.
> (get_stmt_count): New.
> (afdo_get_callsite_count): New.
> (afdo_get_bb_count): New.
> (afdo_annotate_cfg): New.
> (read_profile): New.
> (process_auto_profile): New.
> (init_auto_profile): New.
> (end_auto_profile): New.
> (afdo_find_equiv_class): New.
> (afdo_propagate_single_edge): New.
> (afdo_propagate_multi_edge): New.
> (afdo_propagate_circuit): New.
> (afdo_propagate): New.
> (afdo_calculate_branch_prob): New.
> (auto_profile): New.
> (gate_auto_profile_ipa): New.
> (struct simple_ipa_opt_pass): New.
> * auto-profile.h (init_auto_profile): New.
> (end_auto_profile): New.
> (process_auto_profile): New.
> (afdo_set_current_function_count): New.
> (afdo_add_bfd_name_mapping): New.
> (afdo_add_copy_scale): New.
> (afdo_calculate_branch_prob): New.
> (afdo_get_callsite_count): New.
> (afdo_get_bb_count): New.
> * profile.c (compute_branch_probabilities): Handle AutoFDO profile.
> (branch_prob): Likeise.
> * loop-unroll.c (decide_unroll_runtime_iterations): Likewise.
> * coverage.c (coverage_init): Likewise.
> * tree-ssa-live.c (remove_unused_scope_block_p): Likewise.
> * common.opt (fauto-profile): New.
> * tree-inline.c (copy_bb): Handle AutoFDO profile.
> (copy_edges_for_bb): Likewise.
> (copy_cfg_body): Likewise.
> * tree-profile.c (direct_call_profiling): Likewise.
> (gate_tree_profile_ipa): Likewise.
> * basic-block.h (EDGE_ANNOTATED): New field.
> (BB_ANNOTATED): New field.
> * tree-cfg.c (gimple_merge_blocks): Handle AutoFDO profile.
> * passes.c (init_o

Re: handle isl and cloog in contrib/download_prerequisites

2012-10-06 Thread Diego Novillo

On 2012-10-06 13:30 , Manuel López-Ibáñez wrote:


+download_prerequisite() {
+WHAT=$1
+VERSION=$2
+PACK=$3
+case $PACK in
+tar.bz2) TARX="j";;
+tar.gz)  TARX="z";;
+esac


GNU tar has not needed 'j' or 'z' for a while.  Not sure whether other 
tars have the same feature, though.


The patch is OK with a minor change:


+wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$WHAT-${VERSION}.${PACK} || 
exit 1
+tar x${TARX}f $WHAT-${VERSION}.${PACK} || exit 1
+ln -sf $WHAT-${VERSION} $WHAT || exit 1
+rm $WHAT-${VERSION}.${PACK} || exit 1
+}
+
+if [ ! -e ./contrib/download_prerequisites ]; then


Better use -x here.


Diego.


Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #3

2012-10-06 Thread Gunther Nikl
Michael Meissner wrote:
> @@ -10326,7 +10352,7 @@ static rtx
>  altivec_expand_ld_builtin (tree exp, rtx target, bool *expandedp)
>  {
>tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
> -  unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
> +  enum rs6000_builtins fcode = (enum rs6000_builtins)
DECL_FUNCTION_CODE (fndecl);
>tree arg0;
>enum machine_mode tmode, mode0;
>rtx pat, op0;

There are several changes like that in your patch. Shouldn't these
changes be done in a separate patch?

Regards,
Gunther


Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #2c

2012-10-06 Thread Gunther Nikl
Michael Meissner wrote:
> On Thu, Oct 04, 2012 at 06:33:33PM +0200, Gunther Nikl wrote:
>> Michael Meissner schrieb:
>>> On Tue, Oct 02, 2012 at 10:13:25AM +0200, Gunther Nikl wrote:
 Michael Meissner wrote:
> Segher Boessenkool asked me on IRC to break out the fix in the last 
> change.
> This patch is just the change to set the default options if the user did 
> not
> use -mcpu= and the compiler was not configured with --with-cpu=.
> Here are the patches.
 Which GCC releases are affected by this bug?
>>> All of them.
>> So this bug is as old as the rs6000 port has PowerPC support? Then GCC
>> 2.95 is also affected?

I took a closer look. AFAICT, the above mentioned bug is a result of
the rewritten option parsing for GCC 4.6. And that was what I wanted
to know. Since this is a regression, should this be fixed in 4.6/4.7?

> Well as I said, it is pretty latent, and most people never have noticed it.
> It really depends on what the options are whether you run into the problem.

Yes, I understood that but this did not address my question.

Regards,
Gunther


Re: [patch] fix libbacktrace build failure on arm-linux

2012-10-06 Thread Ian Lance Taylor
On Sat, Oct 6, 2012 at 8:09 AM, Matthias Klose  wrote:
> current trunk fails to build on arm-linux with:
>
> In file included from ../../../../src/libbacktrace/backtrace.c:35:0:
> ../libgcc/unwind.h: In function '_Unwind_decode_typeinfo_ptr':
> ../libgcc/unwind.h:42:45: error: unused parameter 'base' 
> [-Werror=unused-parameter]
>_Unwind_decode_typeinfo_ptr (_Unwind_Word base, _Unwind_Word ptr)
>  ^
> ../libgcc/unwind.h: In function '__gnu_unwind_24bit':
> ../libgcc/unwind.h:68:41: error: unused parameter 'context'
> [-Werror=unused-parameter]
>__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>  ^
> ../libgcc/unwind.h:68:54: error: unused parameter 'data' 
> [-Werror=unused-parameter]
>__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>   ^
> ../libgcc/unwind.h:68:64: error: unused parameter 'compact'
> [-Werror=unused-parameter]
>__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
> ^
> cc1: all warnings being treated as errors
> make[8]: *** [backtrace.lo] Error 1
>
> the immediate fix is to mark all arguments as unused, however I don't know if
> this function should be used by libbacktrace, if it returns _URC_FAILURE
> unconditionally.

The function is not used by libbacktrace.  It's an inline function
defined in the header file, and the warning is about the inline
function definition.  That is, this is a libgcc bug, not a
libbacktrace bug, it just happens to show up when compiling
libbacktrace for ARM


> * config/arm/unwind-arm.h (__gnu_unwind_24bit): Mark parameters
> as unused.
>
>
> --- libgcc/config/arm/unwind-arm.h  (revision 192162)
> +++ libgcc/config/arm/unwind-arm.h  (working copy)
> @@ -64,8 +64,11 @@
>return tmp;
>  }
>
> +#define __unused __attribute__((unused))
> +
>static inline _Unwind_Reason_Code
> -  __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
> +  __gnu_unwind_24bit (_Unwind_Context * context __unused, _uw data __unused,
> +  int compact __unused)
>  {
>return _URC_FAILURE;
>  }

Don't #define __unused.  Just write __attribute__ ((unused)) on the
parameters.  Break the lines as needed.

This is OK with that change.

Thanks.

Ian


[Patch, libfortran] Fix usage of secure_getenv with glibc 2.17

2012-10-06 Thread Janne Blomqvist
Hi,

glibc 2.17 has renamed the exported but ostensibly private symbol
__secure_getenv to secure_getenv. The attached patch updates
libgfortran to first check for the presence of this new symbol. See
http://sourceware.org/glibc/wiki/Tips_and_Tricks/secure_getenv for
more details.

Committed as obvious.

2012-10-06  Janne Blomqvist  

* configure.ac: Check for presence of secure_getenv.
* libgfortran.h: Use HAVE_SECURE_GETENV.
* Makefile.in: Regenerated.
* aclocal.m4: Regenerated.
* config.h.in: Regenerated.
* configure: Regenerated.



-- 
Janne Blomqvist


secure_getenv.diff
Description: Binary data


[Patch, Fortran] PR 40453: Enhanced (recursive) argument checking

2012-10-06 Thread Janus Weil
Hi all,

here is a rather straightforward patch, which does 'recursive'
checking of dummy procedures.

Regtested on x86_64-unknown-linux-gnu. Ok for trunk?

Cheers,
Janus


2012-10-06  Janus Weil  

PR fortran/40453
* interface.c (check_dummy_characteristics): Recursively check dummy
procedures.

2012-10-06  Janus Weil  

PR fortran/40453
* gfortran.dg/dummy_procedure_9.f90: New.


pr40453.diff
Description: Binary data


dummy_procedure_9.f90
Description: Binary data


Re: [TILE-Gx, committed] support -mcmodel=MODEL

2012-10-06 Thread Gerald Pfeifer
On Fri, 14 Sep 2012, Walter Lee wrote:
> How does this look:

Good.  Just, for some reason, the mail did not appear in my inbox
and I was backlogged on gcc-patches; sorry for missing it.

Here is a small markup change I applied on the wwwdocs side on
top of your patch.

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.38
diff -u -3 -p -r1.38 changes.html
--- changes.html6 Oct 2012 15:47:44 -   1.38
+++ changes.html6 Oct 2012 22:19:27 -
@@ -348,8 +348,9 @@ by this change.
 TILE-Gx
 
   
-Added support for the -mcmodel=MODEL command-line option.  The
-models supported are small and large.
+Added support for the -mcmodel=MODEL
+command-line option.  The models supported are small
+and large.
   
 
 XStormy16


Re: handle isl and cloog in contrib/download_prerequisites

2012-10-06 Thread NightStrike
On Sat, Oct 6, 2012 at 7:30 AM, Manuel López-Ibáñez
 wrote:
> Hi,
>
> GCC now requires ISL and a very new CLOOG but download_prerequisites
> does not download those. Also, there is only one sensible place to

As of what version is isl/cloog no longer optional?


Re: [google] AutoFDO implementation

2012-10-06 Thread Andi Kleen
Jan Hubicka  writes:
>
> I think it is useful feature, yes (and was in my TODO list for quite some
> time). Unlike edge profiles, these profiles should be also more independent of
> source code/configuration changes.

It would be good to look at the tool, but:

- Does it use the perf LBR support to estimate the last 16 basic blocks
for each sample?

- It doesn't seem to use callgraphs for estimating common indirect calls
for devirtualization? 

I always disliked the limitation of the current indirect call profiling
to only support a single translation unit. Using callgraphs around
that would be quite nice.

This would only work for non tail calls and with frame pointers, or
using LBRs again.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [google] AutoFDO implementation

2012-10-06 Thread Dehao Chen
On Sat, Oct 6, 2012 at 10:55 AM, Jan Hubicka  wrote:
>> Hi,
>>
>> This patch implements the fine-graind AutoFDO optimizations for GCC.
>> It uses linux perf to collect sample profiles, and uses debug info to
>> represent the profile. In GCC, it uses the profile to annotate CFG to
>> drive FDO. This can bring 50% to 110% of the speedup derived by
>> traditional instrumentation based FDO. (Average is between 70% to 80%
>> for many CPU intensive applications). Comparing with traditional FDO,
>> AutoFDO does not require instrumentation. It just need to have an
>> optimized binary with debug info to collect the profile.
>>
>> This patch has passed bootstrap and gcc regression tests as well as
>> tested with crosstool. Okay for google branches?
>>
>> If people in up-stream find this feature interesting, I'll spend some
>> time to port this to trunk and try to opensource the tool to generate
>> profile data file.
>
> I think it is useful feature, yes (and was in my TODO list for quite some
> time). Unlike edge profiles, these profiles should be also more independent of
> source code/configuration changes.

Thanks for your feedback and interest. Yes, in AutoFDO the coupling
between the profiling build and fdo build are much loosen.

>
> Just few quick questions from first glance over the patch...
>>
>> Dehao
>>
>> The patch can also be viewed from:
>>
>> http://codereview.appspot.com/6567079
>>
>> gcc/ChangeLog.google-4_7:
>> 2012-09-28  Dehao Chen  
>>
>> * cgraphbuild.c (build_cgraph_edges): Handle AutoFDO profile.
>> (rebuild_cgraph_edges): Likewise.
>> * cgraph.c (cgraph_clone_node): Likewise.
>> (clone_function_name): Likewise.
>> * cgraph.h (cgraph_node): New field.
>> * tree-pass.h (pass_ipa_auto_profile): New pass.
>> * cfghooks.c (make_forwarder_block): Handle AutoFDO profile.
>> * ipa-inline-transform.c (clone_inlined_nodes): Likewise.
>> * toplev.c (compile_file): Likewise.
>> (process_options): Likewise.
>> * debug.h (auto_profile_debug_hooks): New.
>> * cgraphunit.c (cgraph_finalize_compilation_unit): Handle AutoFDO
>> profile.
>> (cgraph_copy_node_for_versioning): Likewise.
>> * regs.h (REG_FREQ_FROM_BB): Likewise.
>> * gcov-io.h: (GCOV_TAG_AFDO_FILE_NAMES): New.
>> (GCOV_TAG_AFDO_FUNCTION): New.
>> (GCOV_TAG_AFDO_MODULE_GROUPING): New.
>> * ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Handle AutoFDO profile.
>> * ipa-inline.c (edge_hot_enough_p): Likewise.
>> (edge_badness): Likewise.
>> (inline_small_functions): Likewise.
>> * dwarf2out.c (auto_profile_debug_hooks): New.
>> * opts.c (common_handle_option): Handle AutoFDO profile.
>> * timevar.def (TV_IPA_AUTOFDO): New.
>> * predict.c (compute_function_frequency): Handle AutoFDO profile.
>> (rebuild_frequencies): Handle AutoFDO profile.
>> * auto-profile.c (struct gcov_callsite_pos): New.
>> (struct gcov_callsite): New.
>> (struct gcov_stack): New.
>> (struct gcov_function): New.
>> (struct afdo_bfd_name): New.
>> (struct afdo_module): New.
>> (afdo_get_filename): New.
>> (afdo_get_original_name_size): New.
>> (afdo_get_bfd_name): New.
>> (afdo_read_bfd_names): New.
>> (afdo_stack_hash): New.
>> (afdo_stack_eq): New.
>> (afdo_function_hash): New.
>> (afdo_function_eq): New.
>> (afdo_bfd_name_hash): New.
>> (afdo_bfd_name_eq): New.
>> (afdo_bfd_name_del): New.
>> (afdo_module_hash): New.
>> (afdo_module_eq): New.
>> (afdo_module_num_strings): New.
>> (afdo_add_module): New.
>> (read_aux_modules): New.
>> (get_inline_stack_size_by_stmt): New.
>> (get_inline_stack_size_by_edge): New.
>> (get_function_name_from_block): New.
>> (get_inline_stack_by_stmt): New.
>> (get_inline_stack_by_edge): New.
>> (afdo_get_function_count): New.
>> (afdo_set_current_function_count): New.
>> (afdo_add_bfd_name_mapping): New.
>> (afdo_add_copy_scale): New.
>> (get_stack_count): New.
>> (get_stmt_count): New.
>> (afdo_get_callsite_count): New.
>> (afdo_get_bb_count): New.
>> (afdo_annotate_cfg): New.
>> (read_profile): New.
>> (process_auto_profile): New.
>> (init_auto_profile): New.
>> (end_auto_profile): New.
>> (afdo_find_equiv_class): New.
>> (afdo_propagate_single_edge): New.
>> (afdo_propagate_multi_edge): New.
>> (afdo_propagate_circuit): New.
>> (afdo_propagate): New.
>> (afdo_calculate_branch_prob): New.
>> (auto_profile): New.
>> (gate_auto_profile_ipa): New.
>> (struct simple_ipa_opt_pass): New.
>> * auto-profile.h (init_auto_profile): New.
>> (end_auto_profile): New.
>> (process_auto_profile): New.
>> (afdo_set_current_function_count): New.
>> (afdo_add_bfd_name_mapping): New.
>> (afdo_add_copy_scale): New.
>> (afdo_calculate_branch_prob): New.
>> (afdo_get_callsite_count): New.
>> (afdo_get_bb_count): New.
>> * profile.c (compute_branch_probabilities): Handle AutoFDO profile.
>> (branch_prob): Likeise.
>> * loop-unroll.c (decide_unroll_runtime_iterations): Likewise.
>> * coverage.c (coverage_init): Likewise.
>> * tree-ssa-live.c (remove_unused_scope_block_p): Likewise.
>> * common.opt (fauto-profile): New.
>> * tree-inline.c (copy_bb): Handle AutoFDO profile.
>> (copy

Re: [google] AutoFDO implementation

2012-10-06 Thread Dehao Chen
On Sat, Oct 6, 2012 at 6:13 PM, Andi Kleen  wrote:
> Jan Hubicka  writes:
>>
>> I think it is useful feature, yes (and was in my TODO list for quite some
>> time). Unlike edge profiles, these profiles should be also more independent 
>> of
>> source code/configuration changes.
>
> It would be good to look at the tool, but:

Thanks a lot for the interests and feedback.

>
> - Does it use the perf LBR support to estimate the last 16 basic blocks
> for each sample?

Yes, it uses LBR to calculate binary level profile.

>
> - It doesn't seem to use callgraphs for estimating common indirect calls
> for devirtualization?

At this phase, VPT is not supported, but will definitely support later.

>
> I always disliked the limitation of the current indirect call profiling
> to only support a single translation unit. Using callgraphs around
> that would be quite nice.

You can take a look at LIPO, which supports multiple indirect call
targets. But yes, LBR can give us complete indirect call histogram for
each callsite, and almost for free. We'll use that to drive indirect
call promotion.

Currently, open-sourcing the tool to generate profile data file seemed
more work than rewriting it because it has some internal dependencies.
My plan is to integrate it as part of perf, i.e. one can use perf to
get .afdo file directly. We'll try to make this happen soon and make
into the gcc 4.9 release schedule.

Cheers,
Dehao

>
> This would only work for non tail calls and with frame pointers, or
> using LBRs again.
>
> -Andi
>
> --
> a...@linux.intel.com -- Speaking for myself only


[PATCH] Add TF support for OpenBSD/i386 and OpenBSD/amd64

2012-10-06 Thread Mark Kettenis
Adds the necessary support bits to libgcc.  All other mainstream
i386/amd64 targets already have this.

Tested on i386-unknown-openbsd5.2 and x86_64-unknown-openbsd5.2.
Fixes a couple of testcases.


libgcc/:

2012-10-06  Mark Kettenis  

* config.host (i[34567]86-*-openbsd* and x86_64-*-openbsd*):
Add to list of i[34567]86-*-* and x86_64-*-* soft-fp targets.

gcc/:

2012-10-06  Mark Kettenis  

* config/i386/openbsdelf.h (LIBGCC2_HAS_TF_MODE, LIBGCC2_TF_CEXT) 
(TF_SIZE): Define.


Index: libgcc/config.host
===
--- libgcc/config.host  (revision 192154)
+++ libgcc/config.host  (working copy)
@@ -1156,7 +1156,8 @@
   i[34567]86-*-gnu* | \
   i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]* | \
   i[34567]86-*-cygwin* | i[34567]86-*-mingw* | x86_64-*-mingw* | \
-  i[34567]86-*-freebsd* | x86_64-*-freebsd*)
+  i[34567]86-*-freebsd* | x86_64-*-freebsd* | \
+  i[34567]86-*-openbsd* | x86_64-*-openbsd*)
tmake_file="${tmake_file} t-softfp-tf"
if test "${host_address}" = 32; then
tmake_file="${tmake_file} i386/${host_address}/t-softfp"
Index: gcc/config/i386/openbsdelf.h
===
--- gcc/config/i386/openbsdelf.h(revision 192154)
+++ gcc/config/i386/openbsdelf.h(working copy)
@@ -111,3 +111,9 @@
 #define OBSD_HAS_CORRECT_SPECS
 
 #define HAVE_ENABLE_EXECUTE_STACK
+
+/* Put all *tf routines in libgcc.  */
+#undef LIBGCC2_HAS_TF_MODE
+#define LIBGCC2_HAS_TF_MODE 1
+#define LIBGCC2_TF_CEXT q
+#define TF_SIZE 113