Re: PATCH: Correct alloca length in dump_gimple_bb_header

2012-10-30 Thread Jakub Jelinek
On Mon, Oct 29, 2012 at 10:53:42PM -0700, H.J. Lu wrote:
> ---2012-10-30  H.J. Lu  
> 
>   * gimple-pretty-print.c (dump_gimple_bb_header): Avoid alloca.

Ok, thanks.

Jakub


Adapt one fold-const optimization for vectors

2012-10-30 Thread Marc Glisse

Hello,

one more optimization that needed help for vectors, it crashed on (xBecause of PR 55001, testcases are awkward to add (I could do a x86-only 
one if needed).


bootstrap+testsuite.

2012-10-30  Marc Glisse  

* fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors.
(fold_binary_loc): call it for VEC_COND_EXPR.

--
Marc Glisse


[patch,libgcc] m32r*rtems* add crtinit.o and crtfinit.o

2012-10-30 Thread Ralf Corsepius

Hi,

I would like to apply the patch below to trunk and gcc-4.7-branch.

This patch was originalyl submitted by Joel Sherrill back in May 
(http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01180.html),

but had never received any feedback.

It has been part of the rtems-gcc patches, since then.

Ralf
2012-05-16  Joel Sherrill 

	* config.host (m32r-*-rtems*): Include crtinit.o and crtfinit.o
 	as extra_parts.

diff --git a/libgcc/config.host b/libgcc/config.host
index 051d6b0..bbf21a9 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -693,6 +693,7 @@ m32r-*-elf*)
  	;;
 m32r-*-rtems*)
 	tmake_file="$tmake_file m32r/t-m32r t-fdpbit"
+	extra_parts="$extra_parts crtinit.o crtfini.o"
 	;;
 m32rle-*-elf*)
 	tmake_file=t-fdpbit


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Matthew Gretton-Dann
On 30 October 2012 05:20, Teresa Johnson  wrote:
> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 192692)
> +++ cfgrtl.c(working copy)
> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
>   partition boundaries).  See  the comments at the top of
>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (BB_PARTITION (a) != BB_PARTITION (b))
> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
> +  || BB_PARTITION (a) != BB_PARTITION (b))
>  return false;
>
>/* Protect the loop latches.  */
> @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi
>   partition boundaries).  See  the comments at the top of
>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (BB_PARTITION (a) != BB_PARTITION (b))
> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
> +  || BB_PARTITION (a) != BB_PARTITION (b))
>  return false;
>
>/* Protect the loop latches.  */

As this if() condition seems to be the canonical way to detect being
in a different partition should it be moved out into a query function,
and all of cfgrtl.c updated to use it?

[Note I am not a maintainer and so can't approve/reject your patch].

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-30 Thread Jakub Jelinek
On Mon, Oct 29, 2012 at 02:07:55PM -0400, David Miller wrote:
> > I'd like to close the stage 1 phase of GCC 4.8 development
> > on Monday, November 5th.  If you have still patches for new features you'd
> > like to see in GCC 4.8, please post them for review soon.  Patches
> > posted before the freeze, but reviewed shortly after the freeze, may
> > still go in, further changes should be just bugfixes and documentation
> > fixes.
> 
> I'd like to get the Sparc cbcond stuff in (3 revisions posted) which
> is waiting for Eric B. to do some Solaris specific work.

That has been posted in stage 1, so it is certainly ok to commit it even
during early stage 3.  And, on a case by case basis exceptions are always
possible.  This hasn't changed in the last few years.  By the reviewed
shortly after the freeze I just want to say that e.g. having large intrusive
patches posted now, but reviewed late December is already too late.

As for postponing end of stage 1 by a few weeks because of the storm, I'm
afraid if we want to keep roughly timely releases we don't have that luxury.
If you look at http://gcc.gnu.org/develop.html, ending stage 1 around end of
October happened already for 4.6 and 4.7, for 4.5 if was a month earlier and
for 4.4 even two months earlier.  The 4.7 bugfixing went IMHO smothly, but
we certainly have to expect lots of bugfixing.

> I'd also like to enable LRA for at least 32-bit sparc, even if I can't
> find the time to work on auditing 64-bit completely.

I agree with Eric that it is better to enable it for the whole target
together, rather than based on some options.  Enabling LRA in early stage 3
for some targets should be ok, if it doesn't require too large and intrusive
changes to the generic code that could destabilize other targets.

Jakub


[PATCH] PR 54472

2012-10-30 Thread Andrey Belevantsev

Hello,

This PR is due to the selective scheduling missing the dependencies with 
implicit_sets.  From the sched-deps.c code it looks like implicit sets 
generate anti dependencies with either sets, uses or clobbers, so that's 
that I've done with the below patch.  Vlad, as it looks you've added 
implicit sets, does the above conclusion look correct?  I will commit the 
patch then after bootstrapping and testing will complete.


Yours,
Andrey

2012-10-30  Andrey Belevantsev  

PR rtl-optimization/54472

* sel-sched-ir.c (has_dependence_note_reg_set): Handle
implicit sets.
(has_dependence_note_reg_clobber,
has_dependence_note_reg_use): Likewise.

2012-10-30  Andrey Belevantsev  

PR rtl-optimization/54472

* gcc.dg/pr54472.c: New test.

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 2a7a170..220568a 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3185,7 +3185,7 @@ has_dependence_note_reg_set (int regno)
  || reg_last->clobbers != NULL)
*dsp = (*dsp & ~SPECULATIVE) | DEP_OUTPUT;

-  if (reg_last->uses)
+  if (reg_last->uses || reg_last->implicit_sets)
*dsp = (*dsp & ~SPECULATIVE) | DEP_ANTI;
 }
 }
@@ -3205,7 +3205,7 @@ has_dependence_note_reg_clobber (int regno)
   if (reg_last->sets)
*dsp = (*dsp & ~SPECULATIVE) | DEP_OUTPUT;

-  if (reg_last->uses)
+  if (reg_last->uses || reg_last->implicit_sets)
*dsp = (*dsp & ~SPECULATIVE) | DEP_ANTI;
 }
 }
@@ -3225,7 +3225,7 @@ has_dependence_note_reg_use (int regno)
   if (reg_last->sets)
*dsp = (*dsp & ~SPECULATIVE) | DEP_TRUE;

-  if (reg_last->clobbers)
+  if (reg_last->clobbers || reg_last->implicit_sets)
*dsp = (*dsp & ~SPECULATIVE) | DEP_ANTI;

   /* Merge BE_IN_SPEC bits into *DSP when the dependency producer
diff --git a/gcc/testsuite/gcc.dg/pr54472.c b/gcc/testsuite/gcc.dg/pr54472.c
new file mode 100644
index 000..9395203
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr54472.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */
+/* { dg-options "-O -fschedule-insns -fselective-scheduling" } */
+
+int main ()
+{
+  int a[3][3][3];
+  __builtin_memset (a, 0, sizeof a);
+  return 0;
+}


Add myself to MAINTAINERS

2012-10-30 Thread Gopalasubramanian, Ganesh
Adding myself to the list of members in "write after approval".

Index: ChangeLog
===
--- ChangeLog   (revision 192977)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2012-10-30 Ganesh Gopalasubramanian  
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2012-10-26  James Greenhalgh  

* MAINTAINERS (Write After Approval): Add myself.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 192977)
+++ MAINTAINERS (working copy)
@@ -372,6 +372,7 @@
 Chao-ying Fu   f...@mips.com
 Gary Funck g...@intrepid.com
 Pompapathi V Gadad pompapathi.v.ga...@nsc.com
+Gopalasubramanian Ganesh   ganesh.gopalasubraman...@amd.com
 Kaveh Ghazigh...@gcc.gnu.org
 Matthew Gingellging...@gnat.com
 Tristan Gingoldging...@adacore.com

Regards
Ganesh



Re: [PATCH, libstdc++] Make empty std::string storage readonly

2012-10-30 Thread Michael Haubenwallner

On 08/30/2012 11:45 AM, Jonathan Wakely wrote:
> On 29 August 2012 13:25, Michael Haubenwallner wrote:
>> On 08/28/2012 08:12 PM, Jonathan Wakely wrote:
>>> On 28 August 2012 18:27, Michael Haubenwallner wrote:
>
> Does it actually produce a segfault? I suppose it might on some
> platforms, but not all, so I'm not sure it's worth changing.
>>
>> Using this patch on my x86_64 Gentoo Linux Desktop with gcc-4.7.1 does 
>> segfault
>> as expected - when I make sure the correct libstdc++ is used at runtime,
>> having the '_S_empty_rep_storage' symbol in the .rodata section rather than 
>> .bss.
> 
> If it works reliably on x86_64 then I think the patch is worth considering.
> 
> I'm on holiday for a week, so maybe one of the other maintainers will
> deal with it first.

Any chance to get this in for 4.8?

Thank you!
/haubi/


Re: Adapt one fold-const optimization for vectors

2012-10-30 Thread Marek Polacek
On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote:
> Hello,
> 
> one more optimization that needed help for vectors, it crashed on
> (x do a x86-only one if needed).
> 
> bootstrap+testsuite.
> 
> 2012-10-30  Marc Glisse  
> 
>   * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors.
>   (fold_binary_loc): call it for VEC_COND_EXPR.

Patch missing?



[SH, committed] PR 54963

2012-10-30 Thread Oleg Endo
Hello,

This is the latest proposed patch from the PR.
Tested on rev 192482 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
Pre-approved by Kaz in the PR.
Committed as rev 192983.

Cheers,
Oleg

gcc/ChangeLog:

PR target/54963
* config/sh/iterators.md (SIDI): New mode iterator.
* config/sh/sh.md (negdi2): Use parallel around operation and
T_REG clobber in expander.
(*negdi2): Mark output operand as early clobbered.
Add T_REG clobber.  Split after reload.  Simplify split code.
(abssi2, absdi2): Fold expanders into abs2.
(*abssi2, *absdi2): Fold into *abs2 insn_and_split.
Split insns before reload.
(*negabssi2, *negabsdi2): Fold into *negabs2.
Add T_REG clobber.  Split insns before reload.
(negsi_cond): Reformat.  Use emit_move_insn instead of
gen_movesi.
(negdi_cond): Reformat.  Use emit_move_insn instead of a pair
of gen_movsi.  Split insn before reload.

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192482)
+++ gcc/config/sh/sh.md	(working copy)
@@ -5177,28 +5177,25 @@
 ;; Don't expand immediately because otherwise neg:DI (abs:DI) will not be
 ;; combined.
 (define_expand "negdi2"
-  [(set (match_operand:DI 0 "arith_reg_dest" "")
-	(neg:DI (match_operand:DI 1 "arith_reg_operand" "")))
-   (clobber (reg:SI T_REG))]
-  "TARGET_SH1"
-  "")
+  [(parallel [(set (match_operand:DI 0 "arith_reg_dest")
+		   (neg:DI (match_operand:DI 1 "arith_reg_operand")))
+	  (clobber (reg:SI T_REG))])]
+  "TARGET_SH1")
 
 (define_insn_and_split "*negdi2"
-  [(set (match_operand:DI 0 "arith_reg_dest" "=r")
-	(neg:DI (match_operand:DI 1 "arith_reg_operand" "r")))]
+  [(set (match_operand:DI 0 "arith_reg_dest" "=&r")
+	(neg:DI (match_operand:DI 1 "arith_reg_operand" "r")))
+   (clobber (reg:SI T_REG))]
   "TARGET_SH1"
   "#"
-  "TARGET_SH1"
+  "&& reload_completed"
   [(const_int 0)]
 {
-  rtx low_src = gen_lowpart (SImode, operands[1]);
-  rtx high_src = gen_highpart (SImode, operands[1]);
-  rtx low_dst = gen_lowpart (SImode, operands[0]);
-  rtx high_dst = gen_highpart (SImode, operands[0]);
-
   emit_insn (gen_clrt ());
-  emit_insn (gen_negc (low_dst, low_src));
-  emit_insn (gen_negc (high_dst, high_src));
+  emit_insn (gen_negc (gen_lowpart (SImode, operands[0]),
+		   gen_lowpart (SImode, operands[1])));
+  emit_insn (gen_negc (gen_highpart (SImode, operands[0]),
+		   gen_highpart (SImode, operands[1])));
   DONE;
 })
 
@@ -5272,38 +5269,53 @@
 		(const_int -1)))]
   "TARGET_SHMEDIA" "")
 
-(define_expand "abssi2"
-  [(set (match_operand:SI 0 "arith_reg_dest" "")
-  	(abs:SI (match_operand:SI 1 "arith_reg_operand" "")))
+(define_expand "abs2"
+  [(parallel [(set (match_operand:SIDI 0 "arith_reg_dest")
+		   (abs:SIDI (match_operand:SIDI 1 "arith_reg_operand")))
+	  (clobber (reg:SI T_REG))])]
+  "TARGET_SH1")
+
+(define_insn_and_split "*abs2"
+  [(set (match_operand:SIDI 0 "arith_reg_dest")
+  	(abs:SIDI (match_operand:SIDI 1 "arith_reg_operand")))
(clobber (reg:SI T_REG))]
   "TARGET_SH1"
-  "")
-
-(define_insn_and_split "*abssi2"
-  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
-  	(abs:SI (match_operand:SI 1 "arith_reg_operand" "r")))]
-  "TARGET_SH1"
   "#"
-  "TARGET_SH1"
+  "&& can_create_pseudo_p ()"
   [(const_int 0)]
 {
-  emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
-  emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1],
-		 const1_rtx));
+  if (mode == SImode)
+emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
+  else
+{
+  rtx high_src = gen_highpart (SImode, operands[1]);
+  emit_insn (gen_cmpgesi_t (high_src, const0_rtx));
+}
+
+  emit_insn (gen_neg_cond (operands[0], operands[1], operands[1],
+ const1_rtx));
   DONE;
 })
 
-(define_insn_and_split "*negabssi2"
-  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
-  	(neg:SI (abs:SI (match_operand:SI 1 "arith_reg_operand" "r"]
+(define_insn_and_split "*negabs2"
+  [(set (match_operand:SIDI 0 "arith_reg_dest")
+	(neg:SIDI (abs:SIDI (match_operand:SIDI 1 "arith_reg_operand"
+   (clobber (reg:SI T_REG))]
   "TARGET_SH1"
   "#"
-  "TARGET_SH1"
+  "&& can_create_pseudo_p ()"
   [(const_int 0)]
 {
-  emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
-  emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1],
-		 const0_rtx));
+  if (mode == SImode)
+emit_insn (gen_cmpgesi_t (operands[1], const0_rtx));
+  else
+{
+  rtx high_src = gen_highpart (SImode, operands[1]);
+  emit_insn (gen_cmpgesi_t (high_src, const0_rtx));
+}
+
+  emit_insn (gen_neg_cond (operands[0], operands[1], operands[1],
+ const0_rtx));
   DONE;
 })
 
@@ -5316,10 +5328,10 @@
 
 (define_insn_and_split "negsi_cond"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
-	(if_then_else:SI (eq:SI (reg

Re: [PATCH, libstdc++] Make empty std::string storage readonly

2012-10-30 Thread Jonathan Wakely
On 30 October 2012 09:05, Michael Haubenwallner wrote:
> Any chance to get this in for 4.8?

I'm looking into it today.


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Jakub Jelinek
On Mon, Oct 29, 2012 at 05:10:10PM +0100, Richard Biener wrote:
> On Mon, Oct 29, 2012 at 4:25 PM, Dehao Chen  wrote:
> > On Mon, Oct 29, 2012 at 7:17 AM, Michael Matz  wrote:
> >> Hi,
> >>
> >> On Mon, 29 Oct 2012, Richard Biener wrote:
> >>
> >>> > Well, you merely moved the bogus code to gimple-low.c.  It is bogus
> >>> > because you unconditionally overwrite TREE_BLOCK of all operands (and 
> >>> > all
> >
> > Emm, then in gimple-low.c, we should probably not unconditionally
> > overwrite gimple_block for stmts too?
> 
> gimple stmts have no block before gimple-low.

And tree expressions don't have TREE_BLOCK before gimple-low either.
So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
stmts as well as all expression in the operands.  It is not overwriting
anything, no frontend sets TREE_BLOCK for any expression, the way frontends
associate IL with BLOCKs is by putting them inside of BIND_EXPR/GIMPLE_BIND
after gimplification, and it is gimple-low responsibility to set it.

In 4.3 before tuples, it was solely gimple-low that set TREE_BLOCK
initially.  Before the location_t changes, again it was gimple-low that
was the first setter of TREE_BLOCK, which was valid for all
IS_EXPR_CODE_CLASS.  So, IMNSHO gimple-low should merge location_t
with block for all gimple stmts and all tree expressions used in its
operands.  It shouldn't be set on trees that can be shared, so
say decls etc. should keep using just location_t's without associated block.
So perhaps the right test for gimple-low setting of block is
CAN_HAVE_LOCATION_P (exp) && !tree_node_can_be_shared (exp).

Jakub


Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h

2012-10-30 Thread JonY
On 10/29/2012 21:05, JonY wrote:
>> ChangeLog
>> 2012-10-29  Jonathan Yong  
>>
>>  * config/os/mingw32-w64/os_defines.h: Remove 
>> _GLIBCXX_HAVE_BROKEN_VSWPRINTF
>>  as no longer required.
>>
>>
>>
>> Index: libstdc++-v3/config/os/mingw32-w64/os_defines.h
>> ===
>> --- libstdc++-v3/config/os/mingw32-w64/os_defines.h (revision 192802)
>> +++ libstdc++-v3/config/os/mingw32-w64/os_defines.h (working copy)
>> @@ -63,8 +63,9 @@
>>  // See libstdc++/20806.
>>  #define _GLIBCXX_HAVE_DOS_BASED_FILESYSTEM 1
>>
>> -// See  libstdc++/37522.
>> -#define _GLIBCXX_HAVE_BROKEN_VSWPRINTF 1
>> +// See  libstdc++/37522. mingw-w64 stdio redirect for C++
>> +// #define _GLIBCXX_HAVE_BROKEN_VSWPRINTF 1
>> +// Workaround added for mingw-w64 trunk headers r5437
>>
>>  // See libstdc++/43738
>>  // On native windows targets there is no ioctl function. And the existing
>>
> 
> 

Hi,

Can I have this in before 4.8 branches? Maintainer is away for the few
weeks, but OK'ed it on IRC.





signature.asc
Description: OpenPGP digital signature


Re: [PATCH, libstdc++] Make empty std::string storage readonly

2012-10-30 Thread Jonathan Wakely
On 30 October 2012 09:28, Jonathan Wakely wrote:
> On 30 October 2012 09:05, Michael Haubenwallner wrote:
>> Any chance to get this in for 4.8?
>
> I'm looking into it today.

Consider the case where one object file containing
std::string().erase() is built with an older GCC without the fix for
PR 40518, then it's linked to a new libstdc++.so where the empty rep
is read-only.  The program will attempt to write to the empty rep, but
now it's read-only and will crash.  I don't think we can apply it
unless we change the library ABI so that no pre-PR40518 objects can
link to a libstdc++.so containing a read-only empty rep.


Re: PATCH: PR rtl-optimization/55093: [4.8 Regression] [x32] -maddress-mode=long failed

2012-10-30 Thread Richard Sandiford
"H.J. Lu"  writes:
>>> LRA has
>>>
>>>   if (REG_P (reg) && (ep = get_elimination (reg)) != NULL)
>>> {
>>>   rtx to_rtx = replace_p ? ep->to_rtx : ep->from_rtx;
>>>
>>>   if (! replace_p)
>>> {
>>>   offset += (ep->offset - ep->previous_offset);
>>>   offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));
>>> }
>>>
>>>   if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
>>> to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), 
>>> to_rtx);
>>>
>>> Reload has
>>>
>>> rtx to_rtx = ep->to_rtx;
>>> offset += ep->offset;
>>> offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));
>>>
>>> if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
>>>   to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)),
>>> to_rtx);
>>>
>>> (gdb) call debug_rtx (ep->to_rtx)
>>> (reg/f:DI 7 sp)
>>> (gdb) call debug_rtx (ep->from_rtx)
>>> (reg/f:DI 16 argp)
>>> (gdb)
>>>
>>> gen_lowpart returns (reg/f:DI 7 sp) for reload and (reg:SI 16 argp)
>>> for LRA.   They are caused by
>>>
>>>   if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
>>>   /* We should convert arg register in LRA after the elimination
>>>  if it is possible.  */
>>>   && xregno == ARG_POINTER_REGNUM
>>>   && ! lra_in_progress)
>>> return -1;
>>>
>>> It doesn't work in this case.
>>>
>>
>> This testcase shows that LRA can't convert arg register after
>> the elimination.
>>
> Here is a patch to remove ra_in_progress check for
> ARG_POINTER_REGNUM.  Tested on Linux.x86-64.
> OK to install?

Thanks HJ.  This looks good to me.  As well as your testcase, I think
it would be dangerous to reduce this kind of subreg during non-final
elimination in cases where the argument pointer occupies more than one
hard register (like avr IIRC).  We could end up with something like
ARG_POINTER_REGNUM+1, which wouldn't show up as an elimination register
during the rest of LRA.

It's important that we do get rid of the subreg during the final
elimination stage, but I think alter_subreg already handles that case.

Since this code is outside the LRA files: patch is OK if Vlad agrees.

Richard


Re: [PATCH, libstdc++] Make empty std::string storage readonly

2012-10-30 Thread Jonathan Wakely
On 30 October 2012 10:11, Jonathan Wakely wrote:
> On 30 October 2012 09:28, Jonathan Wakely wrote:
>> On 30 October 2012 09:05, Michael Haubenwallner wrote:
>>> Any chance to get this in for 4.8?
>>
>> I'm looking into it today.
>
> Consider the case where one object file containing
> std::string().erase() is built with an older GCC without the fix for
> PR 40518, then it's linked to a new libstdc++.so where the empty rep
> is read-only.  The program will attempt to write to the empty rep, but
> now it's read-only and will crash.  I don't think we can apply it
> unless we change the library ABI so that no pre-PR40518 objects can
> link to a libstdc++.so containing a read-only empty rep.

If I can convince myself this can't happen then I'll commit it, but I
need to look into it further this evening.


Re: RFA: patch to fix PR55116

2012-10-30 Thread Richard Sandiford
"H.J. Lu"  writes:
> On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu  wrote:
>> On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu  wrote:
>>> On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov
>>>  wrote:
 On 12-10-29 12:21 PM, Richard Sandiford wrote:
>
> Vladimir Makarov  writes:
>>
>> H.J. in
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116
>>
>> reported an interesting address
>>
>> (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ])
>>   (const_int 2 [0x2]))
>>   (symbol_ref:SI ("glob_vol_int_arr") > 0x703c2720 glob_vol_int_arr>)) 0)
>>   (const_int 4294967295 [0x]))
>>
>> which can not be correctly extracted.  Here `and' with `subreg'
>> behaves as an address mutation.
>>
>> The following patch fixes the problem.
>>
>> Ok to commit, Richard?
>
> Heh, I wondered if subregs might still be used like that, and was almost
> tempted to add them just in case.
>
> I think this particular case is really a failed canonicalisation and that:
>
> (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x))
>
> ought to be:
>
> (zero_extend:DI (foo:SI ...))

 Yes, that was my thought too.

> But I know I've approved MIPS patches to accept
> (and:DI ... (const_int 0x)) as an alternative.
>
>> Index: rtlanal.c
>> ===
>> --- rtlanal.c   (revision 192942)
>> +++ rtlanal.c   (working copy)
>> @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum
>>  else if (code == AND && CONST_INT_P (XEXP (*loc, 1)))
>>   /* (and ... (const_int -X)) is used to align to X bytes.  */
>>   loc = &XEXP (*loc, 0);
>> +  else if (code == SUBREG
>> +  && ! REG_P (XEXP (*loc, 0)) && ! MEM_P (XEXP (*loc, 0)))
>> +   /* (subreg (operator ...) ...) usually inside and is used for
>> +  mode conversion too.  */
>> +   loc = &XEXP (*loc, 0);
>
> I think the condition should be:
>
>else if (code == SUBREG
> && !OBJECT_P (SUBREG_REG (*loc))
> && subreg_lowpart (*loc))
>
> OK with that change, if it works.
>
 Yes, it works.
 I've submitted the following patch.

>>>
>>> It doesn't work right.  I will create a new testcase.
>>>
>>
>>
>
> This patch limits SUBREG to Pmode.  Tested on Linux/x86-64.
> OK to install?
>
> Thanks.

The address in this case is:

(plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
(const_int 8 [0x8]))
(subreg:SI (plus:DI (reg/f:DI 20 frame)
(const_int 32 [0x20])) 0))

which after Uros's subreg simplification patch shouldn't be allowed:
the subreg ought to be on the frame register rather than the plus.

The attached patch seems to work for the testcase.  Does it work
more generally?

Richard


gcc/
* lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
rather than gen_rtx_SUBREG.

Index: gcc/lra-eliminations.c
===
--- gcc/lra-eliminations.c  (revision 192983)
+++ gcc/lra-eliminations.c  (working copy)
@@ -550,7 +550,8 @@
  return x;
}
  else
-   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
+   return simplify_gen_subreg (GET_MODE (x), new_rtx,
+   GET_MODE (new_rtx), SUBREG_BYTE (x));
}
 
   return x;


[RFC] Heuristics to throttle the complette unrolling

2012-10-30 Thread Jan Hubicka
Hi,
for past week or two I was playing with ways to throttle down the complette
unrolling heuristics.  I made complette unroller to use the tree-ssa-loop-niter
upper bound and unroll even in non-trivial cases and this has turned out to
increase number of complettely unrolled loops by great amount, so one can
see it as considerable code size growth at -O3 SPEC build.

http://gcc.opensuse.org/SPEC/CFP/sb-vangelis-head-64/Total-size_big.png
it is the largest jump on right hand side in both peak and base runs.
There are also performance imrovements, most impotantly 11% on applu.

The intuition is that complette unrolling is most profitable when IV tests
are eliminated and single basic block is created. When condtionals stay
in the code it is not that good idea and also functions containing calls
are less interesting for unrolling since the calls are slow and optimization
oppurtunities are not so great.

This patch reduces unrolling on loops having many branches or calls on the
hot patch.  I found that for applu speedup the number of branches needs to be
pretty high - about 32.

The patch saves about half of the growth introduced (but on different 
benchmarks)
and I think I can move all peeling to trees and reduce peeling limits a bit, 
too.

Does this sound sane? Any ideas?

Honza

Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c (revision 192892)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -140,6 +140,20 @@ struct loop_size
  instructions after exit are not executed.  */
   int last_iteration;
   int last_iteration_eliminated_by_peeling;
+  
+  /* If some IV computation will become constant.  */
+  bool constant_iv;
+
+  /* Number of call stmts that are not a builtin and are pure or const
+ present on the hot path.  */
+  int num_pure_calls_on_hot_path;
+  /* Number of call stmts that are not a builtin and are not pure nor const
+ present on the hot path.  */
+  int num_non_pure_calls_on_hot_path;
+  /* Number of statements other than calls in the loop.  */
+  int non_call_stmts_on_hot_path;
+  /* Number of branches seen on the hot path.  */
+  int num_branches_on_hot_path;
 };
 
 /* Return true if OP in STMT will be constant after peeling LOOP.  */
@@ -188,7 +202,11 @@ constant_after_peeling (tree op, gimple
   return true;
 }
 
-/* Computes an estimated number of insns in LOOP, weighted by WEIGHTS.
+/* Computes an estimated number of insns in LOOP.
+   EXIT (if non-NULL) is an exite edge that will be eliminated in all but last
+   iteration of the loop.
+   EDGE_TO_CANCEL (if non-NULL) is an non-exit edge eliminated in the last 
iteration
+   of loop.
Return results in SIZE, estimate benefits for complete unrolling exiting by 
EXIT.  */
 
 static void
@@ -198,11 +216,17 @@ tree_estimate_loop_size (struct loop *lo
   gimple_stmt_iterator gsi;
   unsigned int i;
   bool after_exit;
+  VEC (basic_block, heap) *path = get_loop_hot_path (loop);
 
   size->overall = 0;
   size->eliminated_by_peeling = 0;
   size->last_iteration = 0;
   size->last_iteration_eliminated_by_peeling = 0;
+  size->num_pure_calls_on_hot_path = 0;
+  size->num_non_pure_calls_on_hot_path = 0;
+  size->non_call_stmts_on_hot_path = 0;
+  size->num_branches_on_hot_path = 0;
+  size->constant_iv = 0;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "Estimating sizes for loop %i\n", loop->num);
@@ -221,6 +245,8 @@ tree_estimate_loop_size (struct loop *lo
  gimple stmt = gsi_stmt (gsi);
  int num = estimate_num_insns (stmt, &eni_size_weights);
  bool likely_eliminated = false;
+ bool likely_eliminated_last = false;
+ bool likely_eliminated_peeled = false;
 
  if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -231,11 +257,21 @@ tree_estimate_loop_size (struct loop *lo
  /* Look for reasons why we might optimize this stmt away. */
 
  /* Exit conditional.  */
- if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
+ if (exit && body[i] == exit->src
+  && stmt == last_stmt (exit->src))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "   Exit condition will be eliminated.\n");
- likely_eliminated = true;
+   fprintf (dump_file, "   Exit condition will be eliminated "
+"in peeled copies.\n");
+ likely_eliminated_peeled = true;
+   }
+ else if (edge_to_cancel && body[i] == edge_to_cancel->src
+  && stmt == last_stmt (edge_to_cancel->src))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "   Exit condition will be eliminated "
+"in last copy.\n");
+ likely_eliminated_last = true;
}
  /* Sets of IV variables  */
  else if (gimple_code

Re: [RFC] Heuristics to throttle the complette unrolling

2012-10-30 Thread Richard Biener
On Tue, 30 Oct 2012, Jan Hubicka wrote:

> Hi,
> for past week or two I was playing with ways to throttle down the complette
> unrolling heuristics.  I made complette unroller to use the 
> tree-ssa-loop-niter
> upper bound and unroll even in non-trivial cases and this has turned out to
> increase number of complettely unrolled loops by great amount, so one can
> see it as considerable code size growth at -O3 SPEC build.
> 
> http://gcc.opensuse.org/SPEC/CFP/sb-vangelis-head-64/Total-size_big.png
> it is the largest jump on right hand side in both peak and base runs.
> There are also performance imrovements, most impotantly 11% on applu.
> 
> The intuition is that complette unrolling is most profitable when IV tests
> are eliminated and single basic block is created. When condtionals stay
> in the code it is not that good idea and also functions containing calls
> are less interesting for unrolling since the calls are slow and optimization
> oppurtunities are not so great.
> 
> This patch reduces unrolling on loops having many branches or calls on the
> hot patch.  I found that for applu speedup the number of branches needs to be
> pretty high - about 32.
> 
> The patch saves about half of the growth introduced (but on different 
> benchmarks)
> and I think I can move all peeling to trees and reduce peeling limits a bit, 
> too.
> 
> Does this sound sane? Any ideas?

Yes, this sounds ok (beware of unrelated PARAM_MAX_ONCE_PEELED_INSNS
remove in the patch below).

Richard.

> Honza
> 
> Index: tree-ssa-loop-ivcanon.c
> ===
> --- tree-ssa-loop-ivcanon.c   (revision 192892)
> +++ tree-ssa-loop-ivcanon.c   (working copy)
> @@ -140,6 +140,20 @@ struct loop_size
>   instructions after exit are not executed.  */
>int last_iteration;
>int last_iteration_eliminated_by_peeling;
> +  
> +  /* If some IV computation will become constant.  */
> +  bool constant_iv;
> +
> +  /* Number of call stmts that are not a builtin and are pure or const
> + present on the hot path.  */
> +  int num_pure_calls_on_hot_path;
> +  /* Number of call stmts that are not a builtin and are not pure nor const
> + present on the hot path.  */
> +  int num_non_pure_calls_on_hot_path;
> +  /* Number of statements other than calls in the loop.  */
> +  int non_call_stmts_on_hot_path;
> +  /* Number of branches seen on the hot path.  */
> +  int num_branches_on_hot_path;
>  };
>  
>  /* Return true if OP in STMT will be constant after peeling LOOP.  */
> @@ -188,7 +202,11 @@ constant_after_peeling (tree op, gimple
>return true;
>  }
>  
> -/* Computes an estimated number of insns in LOOP, weighted by WEIGHTS.
> +/* Computes an estimated number of insns in LOOP.
> +   EXIT (if non-NULL) is an exite edge that will be eliminated in all but 
> last
> +   iteration of the loop.
> +   EDGE_TO_CANCEL (if non-NULL) is an non-exit edge eliminated in the last 
> iteration
> +   of loop.
> Return results in SIZE, estimate benefits for complete unrolling exiting 
> by EXIT.  */
>  
>  static void
> @@ -198,11 +216,17 @@ tree_estimate_loop_size (struct loop *lo
>gimple_stmt_iterator gsi;
>unsigned int i;
>bool after_exit;
> +  VEC (basic_block, heap) *path = get_loop_hot_path (loop);
>  
>size->overall = 0;
>size->eliminated_by_peeling = 0;
>size->last_iteration = 0;
>size->last_iteration_eliminated_by_peeling = 0;
> +  size->num_pure_calls_on_hot_path = 0;
> +  size->num_non_pure_calls_on_hot_path = 0;
> +  size->non_call_stmts_on_hot_path = 0;
> +  size->num_branches_on_hot_path = 0;
> +  size->constant_iv = 0;
>  
>if (dump_file && (dump_flags & TDF_DETAILS))
>  fprintf (dump_file, "Estimating sizes for loop %i\n", loop->num);
> @@ -221,6 +245,8 @@ tree_estimate_loop_size (struct loop *lo
> gimple stmt = gsi_stmt (gsi);
> int num = estimate_num_insns (stmt, &eni_size_weights);
> bool likely_eliminated = false;
> +   bool likely_eliminated_last = false;
> +   bool likely_eliminated_peeled = false;
>  
> if (dump_file && (dump_flags & TDF_DETAILS))
>   {
> @@ -231,11 +257,21 @@ tree_estimate_loop_size (struct loop *lo
> /* Look for reasons why we might optimize this stmt away. */
>  
> /* Exit conditional.  */
> -   if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
> +   if (exit && body[i] == exit->src
> +&& stmt == last_stmt (exit->src))
>   {
> if (dump_file && (dump_flags & TDF_DETAILS))
> - fprintf (dump_file, "   Exit condition will be eliminated.\n");
> -   likely_eliminated = true;
> + fprintf (dump_file, "   Exit condition will be eliminated "
> +  "in peeled copies.\n");
> +   likely_eliminated_peeled = true;
> + }
> +   else if (edge_to_cancel && body[i] == edge_to_cancel->src
> +&& stmt == last_stmt (edge_to_ca

Re: RFA: patch to fix PR55116

2012-10-30 Thread H.J. Lu
On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford
 wrote:
> "H.J. Lu"  writes:
>> On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu  wrote:
>>> On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu  wrote:
 On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov
  wrote:
> On 12-10-29 12:21 PM, Richard Sandiford wrote:
>>
>> Vladimir Makarov  writes:
>>>
>>> H.J. in
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116
>>>
>>> reported an interesting address
>>>
>>> (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ])
>>>   (const_int 2 [0x2]))
>>>   (symbol_ref:SI ("glob_vol_int_arr") >> 0x703c2720 glob_vol_int_arr>)) 0)
>>>   (const_int 4294967295 [0x]))
>>>
>>> which can not be correctly extracted.  Here `and' with `subreg'
>>> behaves as an address mutation.
>>>
>>> The following patch fixes the problem.
>>>
>>> Ok to commit, Richard?
>>
>> Heh, I wondered if subregs might still be used like that, and was almost
>> tempted to add them just in case.
>>
>> I think this particular case is really a failed canonicalisation and 
>> that:
>>
>> (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x))
>>
>> ought to be:
>>
>> (zero_extend:DI (foo:SI ...))
>
> Yes, that was my thought too.
>
>> But I know I've approved MIPS patches to accept
>> (and:DI ... (const_int 0x)) as an alternative.
>>
>>> Index: rtlanal.c
>>> ===
>>> --- rtlanal.c   (revision 192942)
>>> +++ rtlanal.c   (working copy)
>>> @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum
>>>  else if (code == AND && CONST_INT_P (XEXP (*loc, 1)))
>>>   /* (and ... (const_int -X)) is used to align to X bytes.  */
>>>   loc = &XEXP (*loc, 0);
>>> +  else if (code == SUBREG
>>> +  && ! REG_P (XEXP (*loc, 0)) && ! MEM_P (XEXP (*loc, 0)))
>>> +   /* (subreg (operator ...) ...) usually inside and is used for
>>> +  mode conversion too.  */
>>> +   loc = &XEXP (*loc, 0);
>>
>> I think the condition should be:
>>
>>else if (code == SUBREG
>> && !OBJECT_P (SUBREG_REG (*loc))
>> && subreg_lowpart (*loc))
>>
>> OK with that change, if it works.
>>
> Yes, it works.
> I've submitted the following patch.
>

 It doesn't work right.  I will create a new testcase.

>>>
>>>
>>
>> This patch limits SUBREG to Pmode.  Tested on Linux/x86-64.
>> OK to install?
>>
>> Thanks.
>
> The address in this case is:
>
> (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
> (const_int 8 [0x8]))
> (subreg:SI (plus:DI (reg/f:DI 20 frame)
> (const_int 32 [0x20])) 0))
>
> which after Uros's subreg simplification patch shouldn't be allowed:
> the subreg ought to be on the frame register rather than the plus.
>
> The attached patch seems to work for the testcase.  Does it work
> more generally?
>
> Richard
>
>
> gcc/
> * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
> rather than gen_rtx_SUBREG.
>
> Index: gcc/lra-eliminations.c
> ===
> --- gcc/lra-eliminations.c  (revision 192983)
> +++ gcc/lra-eliminations.c  (working copy)
> @@ -550,7 +550,8 @@
>   return x;
> }
>   else
> -   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
> +   return simplify_gen_subreg (GET_MODE (x), new_rtx,
> +   GET_MODE (new_rtx), SUBREG_BYTE (x));
> }
>
>return x;

I am running the full test.

Thanks.


-- 
H.J.


Re: Fix estimation of array accesses

2012-10-30 Thread Richard Biener
On Mon, 29 Oct 2012, Jan Hubicka wrote:

> > > ICK ...
> > > 
> > > Why not sth as simple as
> > > 
> > >  return num_ssa_operands (stmt, SSA_OP_USE);
> > > 
> > > ?  a[1][2] and b[2] really have the same cost, variable length
> > > objects have extra SSA operands in ARRAY_REF/COMPONENT_REF for
> > > the size.  Thus, stmt cost somehow should reflect the number
> > > of dependent stmts.
> > > 
> > > So in estimate_num_insns I'd try
> > > 
> > > int
> > > estimate_num_insns (gimple stmt, eni_weights *weights)
> > > {
> > >   unsigned cost, i;
> > >   enum gimple_code code = gimple_code (stmt);
> > >   tree lhs;
> > >   tree rhs;
> > > 
> > >   switch (code)
> > >   {
> > >case GIMPLE_ASSIGN:
> > > /* Initialize with the number of SSA uses, one is free.  */
> > > cost = num_ssa_operands (stmt, SSA_OP_USE);
> > > if (cost > 1)
> > >   --cost;
> > 
> > Hi,
> > this is the udpated patch I am testing after today discussion.  I decided to
> > drop the ipa-inline-analysis changes and do that incrementally. So the patch
> > now trashes tramp3d performance by increasing need for early-inlining-insns,
> > but it is not inexpected.
> > 
> > The patch also fixes accounting of addr expr that got previously confused 
> > with
> > a load.
> > 
> > Does this seem sane?
> > 
> > * tree-inline.c (estimate_operator_cost): Return 1 for non-trivial
> > ADDR_EXPR operations.
> > (estimate_num_insns): Do not confuse general single rhs with
> > loads; account cost of non-trivial addr_expr for ASSIGN_EXPR,
> > GIMPLE_RETURN and GIMPLE_ASM
> 
> Hi,
> this patch actually do not cause that much of tramp3d fuzz and no differences
> in testsuite due to unroling changes
> 
> The change are the constants when accounting loads and stores.
> Typical store has two SSA uses (one for address and one for value to store).
> Of course we lose difference in between array offset and pointer dereference.
> 
> Typical load/address expression has one SSA use (for the address)
> 
> Bootstrapped/regtested x86_64-linux, OK?

ChangeLog?

> Honza
> 
> Index: tree-inline.c
> ===
> --- tree-inline.c (revision 192945)
> +++ tree-inline.c (working copy)
> @@ -3447,6 +3447,19 @@ estimate_operator_cost (enum tree_code c
>if (TREE_CODE (op2) != INTEGER_CST)
>  return weights->div_mod_cost;
>return 1;
> +case ADDR_EXPR:
> +  { 
> +tree addr_base;
> +HOST_WIDE_INT addr_offset;
> +
> + addr_base = get_addr_base_and_unit_offset (TREE_OPERAND (op1, 0),
> +&addr_offset);
> + /* If the offset is variable or with non-zero offset, return 2.  */
> + if (!addr_base || addr_offset != 0 || TREE_CODE (addr_base) != MEM_REF
> + || !integer_zerop (TREE_OPERAND (addr_base, 1)))
> +   return 1;

The comment doesn't match the code.  Why the TREE_CODE (addr_base) != 
MEM_REF check?  If it isn't a MEM_REF then it is a plain decl, thus
no dereference.  So it's not clear what you want here ...?

It looks like you want to pessimize variable addresses here,
like &a.a[i]?  Before all ADDR_EXPR cost was zero.

I'd say you want simply

if (!addr_base || addr_offset != 0)
  return 1;

no?  Or even

if (!addr_base
|| (addr_offset != 0 && TREE_CODE (addr_base) == MEM_REF))
  return 1;

that keeps &decl + CST as cost 0.  Or even

if (!addr_base)
  return 1;

that even keeps ptr + CST as cost 0.  Both because they are likely
combined with some complex addressing mode later.

> +  }
> +  return 0;

Inside the } please.

>  
>  default:
>/* We expect a copy assignment with no operator.  */
> @@ -3512,14 +3525,24 @@ estimate_num_insns (gimple stmt, eni_wei
>lhs = gimple_assign_lhs (stmt);
>rhs = gimple_assign_rhs1 (stmt);
>  
> -  if (is_gimple_reg (lhs))
> - cost = 0;
> -  else
> +  /* Store.  */
> +  if (gimple_vdef (stmt))
>   cost = estimate_move_cost (TREE_TYPE (lhs));
> +  else
> + cost = 0;

That change seems bogus.  A !is_gimple_reg lhs is always a store.

>  
> -  if (!is_gimple_reg (rhs) && !is_gimple_min_invariant (rhs))
> +  /* Load.  */
> +  if (gimple_vuse (stmt))
>   cost += estimate_move_cost (TREE_TYPE (rhs));

Likewise.  If rhs1 is not a register or a constant this is a load.
All stores also have a VUSE so you are always accounting a store
as aggregate copy this way.

>  
> +  /* Stores, loads and address expressions may have variable array
> +  references in them. Account these.  */
> +  if (gimple_vdef (stmt))
> + cost += MAX (0, (int)num_ssa_operands (stmt, SSA_OP_USE) - 2);
> +  else if (gimple_vuse (stmt)
> +   || gimple_assign_rhs_code (stmt) == ADDR_EXPR)
> + cost += MAX (0, (int)num_ssa_operands (stmt, SSA_OP_USE) - 1);
> +

Rather than doing this here in this awkwa

Re: [patch] Unify bitmap interface.

2012-10-30 Thread Diego Novillo
On Tue, Oct 30, 2012 at 2:17 AM, Bin.Cheng  wrote:
> On Tue, Oct 30, 2012 at 8:23 AM, Lawrence Crowl  wrote:
>> On 10/29/12, Diego Novillo  wrote:
>>> On Oct 29, 2012 Diego Novillo  wrote:
>>> > Just to make sure.  Testing on ppc should be fast, for example.
>>>
>>> And useless.  Your patch does not touch ppc.
>>
>> I've fixed the #if 0 and the remaining suggestions will happen in
>> another patch.  I've committed this one.
>>
>> ===
>>
>> This patch implements the unification of the *bitmap interfaces as discussed.
>> Essentially, we rename ebitmap and sbitmap functions to use the same names
>> as the bitmap functions.  This rename works because we can now overload
>> on the bitmap type.  Some macros now become inline functions to enable
>> that overloading.
>>
>> The sbitmap non-bool returning bitwise operations have been merged with
>> the bool versions.  Sometimes this merge involved modifying the non-bool
>> version to compute the bool value, and sometimes modifying bool version to
>> add additional work from the non-bool version.  The redundant routines have
>> been removed.
>>
>> The allocation functions have not been renamed, because we often do not
>> have an argument on which to overload.  The cardinality functions have not
>> been renamed, because they have different parameters, and are thus not
>> interchangable.  The iteration functions have not been renamed, because
>> they are functionally different.
>>
>> Tested on x86_64, contrib/config-list.mk testing passed.
>> Index: gcc/ChangeLog
>>
>
> Just one question: Should we change the name of functions
> "sbitmap_intersection_of_succs/sbitmap_intersection_of_preds/sbitmap_union_of_succs/sbitmap_union_of_preds"
> too? It might be a little confusing that sbitmap_* is used among
> bitmap_*.

Yes.  Lawrence is proceeding with this unification in stages.  The
next few patches should rename these.  The only two sets of functions
that will remain separate for now are the iterators and the bitmap
creation routines, I think.  Lawrence?


Diego.


Re: [patch] Unify bitmap interface.

2012-10-30 Thread Diego Novillo
On Mon, Oct 29, 2012 at 4:25 PM, Steven Bosscher  wrote:
> On Mon, Oct 29, 2012 at 7:54 PM, Diego Novillo wrote:
>
>> Sure.  But the point is not to add more.  We should mechanically strip
>> all the #if 0 code from the tree, btw.  No point keeping all that
>> garbage around.
>
> Please no. A lot (if not most) if the #if 0 code serves as good
> documentation for why something is *not* done, other pieces are there
> to indicate possible enhancement, and some are useful for debugging.

I never really bought into that line of reasoning.  Documenting why an
approach was not taken is better to do it in words than in code that
will grow stale wrt the surrounding code.  Similarly for possible
enhancements.  Prose is better than code in those cases.  If debugging
code is useful, then it can remain predicated on some debugging
switch.


Diego.


[PATCH] Fix PR55111

2012-10-30 Thread Richard Biener

This fixes PR55111.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-10-30  Richard Biener  

PR tree-optimization/55111
* tree-ssa-pre.c (eliminate_insert): Properly fold the built
stmt.

* gcc.dg/torture/pr55111.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 192983)
--- gcc/tree-ssa-pre.c  (working copy)
*** eliminate_insert (gimple_stmt_iterator *
*** 3996,4003 
  
tree res = make_temp_ssa_name (TREE_TYPE (val), NULL, "pretmp");
gimple tem = gimple_build_assign (res,
!   build1 (TREE_CODE (expr),
!   TREE_TYPE (expr), leader));
gsi_insert_before (gsi, tem, GSI_SAME_STMT);
VN_INFO_GET (res)->valnum = val;
  
--- 3996,4003 
  
tree res = make_temp_ssa_name (TREE_TYPE (val), NULL, "pretmp");
gimple tem = gimple_build_assign (res,
!   fold_build1 (TREE_CODE (expr),
!TREE_TYPE (expr), leader));
gsi_insert_before (gsi, tem, GSI_SAME_STMT);
VN_INFO_GET (res)->valnum = val;
  
Index: gcc/testsuite/gcc.dg/torture/pr55111.c
===
*** gcc/testsuite/gcc.dg/torture/pr55111.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr55111.c  (working copy)
***
*** 0 
--- 1,24 
+ /* { dg-do compile } */
+ 
+ int a, b, c;
+ long d;
+ unsigned long *e;
+ 
+ int f(void)
+ {
+   for(;; a++)
+ {
+   if(c)
+   {
+ for(b = d = 0; b < 1; b++)
+   e = &d;
+ 
+ --*e;
+ 
+ if(d > 0)
+   a = 0;
+ 
+ return d;
+   }
+ }
+ }


Re: Adapt one fold-const optimization for vectors

2012-10-30 Thread Marc Glisse

On Tue, 30 Oct 2012, Marek Polacek wrote:


On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote:

Hello,

one more optimization that needed help for vectors, it crashed on
(x

* fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors.
(fold_binary_loc): call it for VEC_COND_EXPR.


Patch missing?


Indeed, thanks for the notice.

Here it is.

--
Marc GlisseIndex: fold-const.c
===
--- fold-const.c(revision 192976)
+++ fold-const.c(working copy)
@@ -5952,42 +5952,47 @@ static tree
 fold_binary_op_with_conditional_arg (location_t loc,
 enum tree_code code,
 tree type, tree op0, tree op1,
 tree cond, tree arg, int cond_first_p)
 {
   tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1);
   tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0);
   tree test, true_value, false_value;
   tree lhs = NULL_TREE;
   tree rhs = NULL_TREE;
+  enum tree_code cond_code = COND_EXPR;
 
-  if (TREE_CODE (cond) == COND_EXPR)
+  if (TREE_CODE (cond) == COND_EXPR
+  || TREE_CODE (cond) == VEC_COND_EXPR)
 {
   test = TREE_OPERAND (cond, 0);
   true_value = TREE_OPERAND (cond, 1);
   false_value = TREE_OPERAND (cond, 2);
   /* If this operand throws an expression, then it does not make
 sense to try to perform a logical or arithmetic operation
 involving it.  */
   if (VOID_TYPE_P (TREE_TYPE (true_value)))
lhs = true_value;
   if (VOID_TYPE_P (TREE_TYPE (false_value)))
rhs = false_value;
 }
   else
 {
   tree testtype = TREE_TYPE (cond);
   test = cond;
   true_value = constant_boolean_node (true, testtype);
   false_value = constant_boolean_node (false, testtype);
 }
 
+  if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE)
+cond_code = VEC_COND_EXPR;
+
   /* This transformation is only worthwhile if we don't have to wrap ARG
  in a SAVE_EXPR and the operation can be simplified on at least one
  of the branches once its pushed inside the COND_EXPR.  */
   if (!TREE_CONSTANT (arg)
   && (TREE_SIDE_EFFECTS (arg)
  || TREE_CONSTANT (true_value) || TREE_CONSTANT (false_value)))
 return NULL_TREE;
 
   arg = fold_convert_loc (loc, arg_type, arg);
   if (lhs == 0)
@@ -6004,21 +6009,21 @@ fold_binary_op_with_conditional_arg (loc
   if (cond_first_p)
rhs = fold_build2_loc (loc, code, type, false_value, arg);
   else
rhs = fold_build2_loc (loc, code, type, arg, false_value);
 }
 
   /* Check that we have simplified at least one of the branches.  */
   if (!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
 return NULL_TREE;
 
-  return fold_build3_loc (loc, COND_EXPR, type, test, lhs, rhs);
+  return fold_build3_loc (loc, cond_code, type, test, lhs, rhs);
 }
 
 
 /* Subroutine of fold() that checks for the addition of +/- 0.0.
 
If !NEGATE, return true if ADDEND is +/-0.0 and, for all X of type
TYPE, X + ADDEND is the same as X.  If NEGATE, return true if X -
ADDEND is the same as X.
 
X + 0 and X - 0 both give X when X is NaN, infinite, or nonzero
@@ -9864,30 +9869,34 @@ fold_binary_loc (location_t loc,
   if (TREE_CODE (arg1) == COMPOUND_EXPR
  && reorder_operands_p (arg0, TREE_OPERAND (arg1, 0)))
{
  tem = fold_build2_loc (loc, code, type, op0,
 fold_convert_loc (loc, TREE_TYPE (op1),
   TREE_OPERAND (arg1, 1)));
  return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND (arg1, 0),
 tem);
}
 
-  if (TREE_CODE (arg0) == COND_EXPR || COMPARISON_CLASS_P (arg0))
+  if (TREE_CODE (arg0) == COND_EXPR
+ || TREE_CODE (arg0) == VEC_COND_EXPR
+ || COMPARISON_CLASS_P (arg0))
{
  tem = fold_binary_op_with_conditional_arg (loc, code, type, op0, op1,
 arg0, arg1,
 /*cond_first_p=*/1);
  if (tem != NULL_TREE)
return tem;
}
 
-  if (TREE_CODE (arg1) == COND_EXPR || COMPARISON_CLASS_P (arg1))
+  if (TREE_CODE (arg1) == COND_EXPR
+ || TREE_CODE (arg1) == VEC_COND_EXPR
+ || COMPARISON_CLASS_P (arg1))
{
  tem = fold_binary_op_with_conditional_arg (loc, code, type, op0, op1,
 arg1, arg0,
 /*cond_first_p=*/0);
  if (tem != NULL_TREE)
return tem;
}
 }
 
   switch (code)


Re: [patch] Apply conditional down cast to cgraph.h et.al.

2012-10-30 Thread Diego Novillo

On 2012-10-29 15:01 , Lawrence Crowl wrote:

On 10/27/12, Marc Glisse  wrote:

On Fri, 26 Oct 2012, Lawrence Crowl wrote:

2012-10-26  Lawrence Crowl  

missing '>'


Fixed.


* is-a.h: New.
(is_a  (U*)): New.  Test for is-a relationship.
(as_a  (U*)): New.  Treat as a derived type.
(dyn_cast  (U*)): New.  Conditionally cast based on is_a.


I can't find this file in the patch...


I forgot to svn add.  Updated patch here.


This patch implements generic type query and conversion functions,
and applies them to the use of cgraph_node, varpool_node, and symtab_node.

The functions are:

bool is_a  (pointer)
   Tests whether the pointer actually points to a more derived TYPE.

TYPE *as_a  (pointer)
   Converts pointer to a TYPE*.

TYPE *dyn_cast  (pointer)
   Converts pointer to TYPE* if and only if "is_a  pointer".
   Otherwise, returns NULL.
   This function is essentially a checked down cast.

These functions reduce compile time and increase type safety when treating a
generic item as a more specific item.  In essence, the code change is from

   if (symtab_function_p (node))
 {
   struct cgraph_node *cnode = cgraph (node);
   
 }

to

   if (cgraph_node *cnode = dyn_cast  (node))
 {
   
 }

The necessary conditional test defines a variable that holds a known good
pointer to the specific item and avoids subsequent conversion calls and
the assertion checks that may come with them.

When, the property test is embedded within a larger condition, the variable
declaration gets pulled out of the condition.  (This leaves some room for
using the variable inappropriately.)

   if (symtab_variable_p (node)
   && varpool (node)->finalized)
 varpool_analyze_node (varpool (node));

becomes

   varpool_node *vnode = dyn_cast  (node);
   if (vnode && vnode->finalized)
 varpool_analyze_node (vnode);

Note that we have converted two sets of assertions in the calls to varpool
into safe and efficient use of a variable.


There are remaining calls to symtab_function_p and symtab_variable_p that
do not involve a pointer to a more specific type.  These have been converted
to calls to a functions is_a  and is_a .  The
original predicate functions have been removed.

The cgraph.h header defined both a struct and a function with the name
varpool_node.  This name overloading can cause some unintuitive error messages
when, as is common in C++, one omits the struct keyword when using the type.
I have renamed the function to varpool_node_for_decl.

Tested on x86_64.


Okay for trunk?
Index: gcc/ChangeLog

2012-10-29  Lawrence Crowl  

* is-a.h: New.
(is_a  (U*)): New.  Test for is-a relationship.
(as_a  (U*)): New.  Treat as a derived type.
(dyn_cast  (U*)): New.  Conditionally cast based on is_a.
* cgraph.h (varpool_node): Rename to varpool_node_for_decl.
Adjust callers to match.
(is_a_helper ::test (symtab_node_def *)): New.
(is_a_helper ::test (symtab_node_def *)): New.
(symtab_node_def::try_function): New.  Change most calls to
symtab_function_p with calls to dyn_cast  (p).
(symtab_node_def::try_variable): New.  Change most calls to
symtab_variable_p with calls to dyn_cast  (p).
(symtab_function_p): Remove.  Change callers to use
 is_a  (p) instead.
(symtab_variable_p): Remove.  Change callers to use
 is_a  (p) instead.
* cgraph.c (cgraph_node_for_asm): Remove redundant call to
symtab_node_for_asm.
* cgraphunit.c (symbol_finalized_and_needed): New.
(symbol_finalized): New.
(cgraph_analyze_functions): Split complicated conditionals out into
above new functions.
* Makefile.in (CGRAPH_H): Add is-a.h as used by cgraph.h.


Thanks.  I really like this cleanup.  I have a few questions and 
comments below.  Honza, the patch looks OK to me but it touches a bunch 
of cgraph code, could you please go over it?




Index: gcc/is-a.h
===
--- gcc/is-a.h  (revision 0)
+++ gcc/is-a.h  (revision 0)
@@ -0,0 +1,118 @@
+/* Dynamic testing for abstract is-a relationships.
+   Copyright (C) 2012 Free Software Foundation, Inc.
+   Contributed by Lawrence Crowl.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+
+/*
+
+Suppose you have a symtab_node_

Re: [patch] Unify bitmap interface.

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 12:53 PM, Diego Novillo  wrote:
> On Mon, Oct 29, 2012 at 4:25 PM, Steven Bosscher  
> wrote:
>> On Mon, Oct 29, 2012 at 7:54 PM, Diego Novillo wrote:
>>
>>> Sure.  But the point is not to add more.  We should mechanically strip
>>> all the #if 0 code from the tree, btw.  No point keeping all that
>>> garbage around.
>>
>> Please no. A lot (if not most) if the #if 0 code serves as good
>> documentation for why something is *not* done, other pieces are there
>> to indicate possible enhancement, and some are useful for debugging.
>
> I never really bought into that line of reasoning.  Documenting why an
> approach was not taken is better to do it in words than in code that
> will grow stale wrt the surrounding code.  Similarly for possible
> enhancements.  Prose is better than code in those cases.  If debugging
> code is useful, then it can remain predicated on some debugging
> switch.

I agree with both of you - #if 0 code if it is useful as comment deserves
being rewritten as a comment.

Richard.

>
> Diego.


Re: Adapt one fold-const optimization for vectors

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 1:14 PM, Marc Glisse  wrote:
> On Tue, 30 Oct 2012, Marek Polacek wrote:
>
>> On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote:
>>>
>>> Hello,
>>>
>>> one more optimization that needed help for vectors, it crashed on
>>> (x>> do a x86-only one if needed).
>>>
>>> bootstrap+testsuite.
>>>
>>> 2012-10-30  Marc Glisse  
>>>
>>> * fold-const.c (fold_binary_op_with_conditional_arg): Handle
>>> vectors.
>>> (fold_binary_loc): call it for VEC_COND_EXPR.
>>
>>
>> Patch missing?
>
>
> Indeed, thanks for the notice.

Ok.

Thanks,
Richard.

> Here it is.
>
> --
> Marc Glisse
> Index: fold-const.c
> ===
> --- fold-const.c(revision 192976)
> +++ fold-const.c(working copy)
> @@ -5952,42 +5952,47 @@ static tree
>  fold_binary_op_with_conditional_arg (location_t loc,
>  enum tree_code code,
>  tree type, tree op0, tree op1,
>  tree cond, tree arg, int cond_first_p)
>  {
>tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1);
>tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0);
>tree test, true_value, false_value;
>tree lhs = NULL_TREE;
>tree rhs = NULL_TREE;
> +  enum tree_code cond_code = COND_EXPR;
>
> -  if (TREE_CODE (cond) == COND_EXPR)
> +  if (TREE_CODE (cond) == COND_EXPR
> +  || TREE_CODE (cond) == VEC_COND_EXPR)
>  {
>test = TREE_OPERAND (cond, 0);
>true_value = TREE_OPERAND (cond, 1);
>false_value = TREE_OPERAND (cond, 2);
>/* If this operand throws an expression, then it does not make
>  sense to try to perform a logical or arithmetic operation
>  involving it.  */
>if (VOID_TYPE_P (TREE_TYPE (true_value)))
> lhs = true_value;
>if (VOID_TYPE_P (TREE_TYPE (false_value)))
> rhs = false_value;
>  }
>else
>  {
>tree testtype = TREE_TYPE (cond);
>test = cond;
>true_value = constant_boolean_node (true, testtype);
>false_value = constant_boolean_node (false, testtype);
>  }
>
> +  if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE)
> +cond_code = VEC_COND_EXPR;
> +
>/* This transformation is only worthwhile if we don't have to wrap ARG
>   in a SAVE_EXPR and the operation can be simplified on at least one
>   of the branches once its pushed inside the COND_EXPR.  */
>if (!TREE_CONSTANT (arg)
>&& (TREE_SIDE_EFFECTS (arg)
>   || TREE_CONSTANT (true_value) || TREE_CONSTANT (false_value)))
>  return NULL_TREE;
>
>arg = fold_convert_loc (loc, arg_type, arg);
>if (lhs == 0)
> @@ -6004,21 +6009,21 @@ fold_binary_op_with_conditional_arg (loc
>if (cond_first_p)
> rhs = fold_build2_loc (loc, code, type, false_value, arg);
>else
> rhs = fold_build2_loc (loc, code, type, arg, false_value);
>  }
>
>/* Check that we have simplified at least one of the branches.  */
>if (!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
>  return NULL_TREE;
>
> -  return fold_build3_loc (loc, COND_EXPR, type, test, lhs, rhs);
> +  return fold_build3_loc (loc, cond_code, type, test, lhs, rhs);
>  }
>
>
>  /* Subroutine of fold() that checks for the addition of +/- 0.0.
>
> If !NEGATE, return true if ADDEND is +/-0.0 and, for all X of type
> TYPE, X + ADDEND is the same as X.  If NEGATE, return true if X -
> ADDEND is the same as X.
>
> X + 0 and X - 0 both give X when X is NaN, infinite, or nonzero
> @@ -9864,30 +9869,34 @@ fold_binary_loc (location_t loc,
>if (TREE_CODE (arg1) == COMPOUND_EXPR
>   && reorder_operands_p (arg0, TREE_OPERAND (arg1, 0)))
> {
>   tem = fold_build2_loc (loc, code, type, op0,
>  fold_convert_loc (loc, TREE_TYPE (op1),
>TREE_OPERAND (arg1, 1)));
>   return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND (arg1,
> 0),
>  tem);
> }
>
> -  if (TREE_CODE (arg0) == COND_EXPR || COMPARISON_CLASS_P (arg0))
> +  if (TREE_CODE (arg0) == COND_EXPR
> + || TREE_CODE (arg0) == VEC_COND_EXPR
> + || COMPARISON_CLASS_P (arg0))
> {
>   tem = fold_binary_op_with_conditional_arg (loc, code, type, op0,
> op1,
>  arg0, arg1,
>  /*cond_first_p=*/1);
>   if (tem != NULL_TREE)
> return tem;
> }
>
> -  if (TREE_CODE (arg1) == COND_EXPR || COMPARISON_CLASS_P (arg1))
> +  if (TREE_CODE (arg1) == COND_EXPR
> + || TREE_CODE (arg1) == VEC_COND_EXPR
> + || COMPARISON_CLASS_P (arg1))
> {
>   tem = fold_binary_op_with_conditional_arg (loc, code, type, op0,

Re: [patch] Apply conditional down cast to cgraph.h et.al.

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 1:20 PM, Diego Novillo  wrote:
> On 2012-10-29 15:01 , Lawrence Crowl wrote:
>>
>> On 10/27/12, Marc Glisse  wrote:
>>>
>>> On Fri, 26 Oct 2012, Lawrence Crowl wrote:

 2012-10-26  Lawrence Crowl  >>
>>>
>>> missing '>'
>>
>>
>> Fixed.
>>
 * is-a.h: New.
 (is_a  (U*)): New.  Test for is-a relationship.
 (as_a  (U*)): New.  Treat as a derived type.
 (dyn_cast  (U*)): New.  Conditionally cast based on is_a.
>>>
>>>
>>> I can't find this file in the patch...
>>
>>
>> I forgot to svn add.  Updated patch here.
>>
>>
>> This patch implements generic type query and conversion functions,
>> and applies them to the use of cgraph_node, varpool_node, and symtab_node.
>>
>> The functions are:
>>
>> bool is_a  (pointer)
>>Tests whether the pointer actually points to a more derived TYPE.
>>
>> TYPE *as_a  (pointer)
>>Converts pointer to a TYPE*.
>>
>> TYPE *dyn_cast  (pointer)
>>Converts pointer to TYPE* if and only if "is_a  pointer".
>>Otherwise, returns NULL.
>>This function is essentially a checked down cast.
>>
>> These functions reduce compile time and increase type safety when treating
>> a
>> generic item as a more specific item.  In essence, the code change is from
>>
>>if (symtab_function_p (node))
>>  {
>>struct cgraph_node *cnode = cgraph (node);
>>
>>  }
>>
>> to
>>
>>if (cgraph_node *cnode = dyn_cast  (node))
>>  {
>>
>>  }
>>
>> The necessary conditional test defines a variable that holds a known good
>> pointer to the specific item and avoids subsequent conversion calls and
>> the assertion checks that may come with them.
>>
>> When, the property test is embedded within a larger condition, the
>> variable
>> declaration gets pulled out of the condition.  (This leaves some room for
>> using the variable inappropriately.)
>>
>>if (symtab_variable_p (node)
>>&& varpool (node)->finalized)
>>  varpool_analyze_node (varpool (node));
>>
>> becomes
>>
>>varpool_node *vnode = dyn_cast  (node);
>>if (vnode && vnode->finalized)
>>  varpool_analyze_node (vnode);
>>
>> Note that we have converted two sets of assertions in the calls to varpool
>> into safe and efficient use of a variable.
>>
>>
>> There are remaining calls to symtab_function_p and symtab_variable_p that
>> do not involve a pointer to a more specific type.  These have been
>> converted
>> to calls to a functions is_a  and is_a .  The
>> original predicate functions have been removed.
>>
>> The cgraph.h header defined both a struct and a function with the name
>> varpool_node.  This name overloading can cause some unintuitive error
>> messages
>> when, as is common in C++, one omits the struct keyword when using the
>> type.
>> I have renamed the function to varpool_node_for_decl.
>>
>> Tested on x86_64.
>>
>>
>> Okay for trunk?
>> Index: gcc/ChangeLog
>>
>> 2012-10-29  Lawrence Crowl  
>>
>> * is-a.h: New.
>> (is_a  (U*)): New.  Test for is-a relationship.
>> (as_a  (U*)): New.  Treat as a derived type.
>> (dyn_cast  (U*)): New.  Conditionally cast based on is_a.
>> * cgraph.h (varpool_node): Rename to varpool_node_for_decl.
>> Adjust callers to match.
>> (is_a_helper ::test (symtab_node_def *)): New.
>> (is_a_helper ::test (symtab_node_def *)): New.
>> (symtab_node_def::try_function): New.  Change most calls to
>> symtab_function_p with calls to dyn_cast  (p).
>> (symtab_node_def::try_variable): New.  Change most calls to
>> symtab_variable_p with calls to dyn_cast  (p).
>> (symtab_function_p): Remove.  Change callers to use
>>  is_a  (p) instead.
>> (symtab_variable_p): Remove.  Change callers to use
>>  is_a  (p) instead.
>> * cgraph.c (cgraph_node_for_asm): Remove redundant call to
>> symtab_node_for_asm.
>> * cgraphunit.c (symbol_finalized_and_needed): New.
>> (symbol_finalized): New.
>> (cgraph_analyze_functions): Split complicated conditionals out
>> into
>> above new functions.
>> * Makefile.in (CGRAPH_H): Add is-a.h as used by cgraph.h.
>
>
> Thanks.  I really like this cleanup.  I have a few questions and comments
> below.  Honza, the patch looks OK to me but it touches a bunch of cgraph
> code, could you please go over it?
>
>
>
>> Index: gcc/is-a.h
>> ===
>> --- gcc/is-a.h  (revision 0)
>> +++ gcc/is-a.h  (revision 0)
>> @@ -0,0 +1,118 @@
>> +/* Dynamic testing for abstract is-a relationships.
>> +   Copyright (C) 2012 Free Software Foundation, Inc.
>> +   Contributed by Lawrence Crowl.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify it under
>> +the terms of the GNU General Public License as published by the Free
>> +Software Foundation; either version 3, or (a

[PATCH] Add gimple load/store predicates, use them from stmt estimates

2012-10-30 Thread Richard Biener

As requested this adds predicates to check whether the lhs of
a assign or call is a store and whether rhs1 of an assignment
is a load.  It uses this in place of the existing, slightly
bogus, check in the stmt estimate code.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2012-10-30  Richard Biener  

* gimple.h (gimple_store_p): New predicate.
(gimple_assign_load_p): Likewise.
* tree-inline.c (estimate_num_insns): Use it.

Index: gcc/gimple.h
===
*** gcc/gimple.h(revision 192984)
--- gcc/gimple.h(working copy)
*** gimple_assign_single_p (gimple gs)
*** 2041,2046 
--- 2041,2071 
&& gimple_assign_rhs_class (gs) == GIMPLE_SINGLE_RHS);
  }
  
+ /* Return true if GS performs a store to its lhs.  */
+ 
+ static inline bool
+ gimple_store_p (gimple gs)
+ {
+   tree lhs = gimple_get_lhs (gs);
+   return lhs && !is_gimple_reg (lhs);
+ }
+ 
+ /* Return true if GS is an assignment that loads from its rhs1.  */
+ 
+ static inline bool
+ gimple_assign_load_p (gimple gs)
+ {
+   tree rhs;
+   if (!gimple_assign_single_p (gs))
+ return false;
+   rhs = gimple_assign_rhs1 (gs);
+   if (TREE_CODE (rhs) == WITH_SIZE_EXPR)
+ return true;
+   rhs = get_base_address (rhs);
+   return (DECL_P (rhs)
+ || TREE_CODE (rhs) == MEM_REF || TREE_CODE (rhs) == TARGET_MEM_REF);
+ }
+ 
  
  /* Return true if S is a type-cast assignment.  */
  
Index: gcc/tree-inline.c
===
*** gcc/tree-inline.c   (revision 192984)
--- gcc/tree-inline.c   (working copy)
*** estimate_num_insns (gimple stmt, eni_wei
*** 3512,3523 
lhs = gimple_assign_lhs (stmt);
rhs = gimple_assign_rhs1 (stmt);
  
!   if (is_gimple_reg (lhs))
!   cost = 0;
!   else
!   cost = estimate_move_cost (TREE_TYPE (lhs));
  
!   if (!is_gimple_reg (rhs) && !is_gimple_min_invariant (rhs))
cost += estimate_move_cost (TREE_TYPE (rhs));
  
cost += estimate_operator_cost (gimple_assign_rhs_code (stmt), weights,
--- 3512,3523 
lhs = gimple_assign_lhs (stmt);
rhs = gimple_assign_rhs1 (stmt);
  
!   cost = 0;
  
!   /* Account for the cost of moving to / from memory.  */
!   if (gimple_store_p (stmt))
!   cost += estimate_move_cost (TREE_TYPE (lhs));
!   if (gimple_assign_load_p (stmt))
cost += estimate_move_cost (TREE_TYPE (rhs));
  
cost += estimate_operator_cost (gimple_assign_rhs_code (stmt), weights,


RE: [PATCH] [AArch64] Add vcond, vcondu support.

2012-10-30 Thread James Greenhalgh
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Marcus Shawcroft
> Sent: 15 October 2012 12:37
> To: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [AArch64] Add vcond, vcondu support.
> 
> On 09/10/12 12:08, James Greenhalgh wrote:
> >
> > Hi,
> >
> > This patch adds support for vcond and vcondu to the AArch64
> > backend.
> >
> > Tested with no regressions on aarch64-none-elf.
> >
> > OK for aarch64-branch?
> >
> > (If so, someone will have to commit for me, as I do not
> > have commit rights.)
> >
> > Thanks
> > James Greenhalgh
> >
> > ---
> > 2012-09-11  James Greenhalgh
> > Tejas Belagod
> >
> > * config/aarch64/aarch64-simd.md
> > (aarch64_simd_bsl_internal): New pattern.
> > (aarch64_simd_bsl): Likewise.
> > (aarch64_vcond_internal): Likewise.
> > (vcondu): Likewise.
> > (vcond): Likewise.
> > * config/aarch64/iterators.md (UNSPEC_BSL): Add to
> define_constants.
> 
> OK
> /Marcus
> 

Hi,

Committed as revision 192985.

Thanks,
James Greenhalgh


Re: [patch] Apply conditional down cast to cgraph.h et.al.

2012-10-30 Thread Jan Hubicka
> 2012-10-29  Lawrence Crowl  
> 
>   * is-a.h: New.
>   (is_a  (U*)): New.  Test for is-a relationship.
>   (as_a  (U*)): New.  Treat as a derived type.
>   (dyn_cast  (U*)): New.  Conditionally cast based on is_a.
>   * cgraph.h (varpool_node): Rename to varpool_node_for_decl.
>   Adjust callers to match.
>   (is_a_helper ::test (symtab_node_def *)): New.
>   (is_a_helper ::test (symtab_node_def *)): New.
>   (symtab_node_def::try_function): New.  Change most calls to
>   symtab_function_p with calls to dyn_cast  (p).
>   (symtab_node_def::try_variable): New.  Change most calls to
>   symtab_variable_p with calls to dyn_cast  (p).
>   (symtab_function_p): Remove.  Change callers to use
> is_a  (p) instead.
>   (symtab_variable_p): Remove.  Change callers to use
> is_a  (p) instead.
>   * cgraph.c (cgraph_node_for_asm): Remove redundant call to
>   symtab_node_for_asm.
>   * cgraphunit.c (symbol_finalized_and_needed): New.
>   (symbol_finalized): New.
>   (cgraph_analyze_functions): Split complicated conditionals out into
>   above new functions.
>   * Makefile.in (CGRAPH_H): Add is-a.h as used by cgraph.h.
> 

the patch is OK,
thanks!
Honza


Re: RFA: patch to fix PR55116

2012-10-30 Thread H.J. Lu
On Tue, Oct 30, 2012 at 4:38 AM, H.J. Lu  wrote:
> On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford
>  wrote:
>> "H.J. Lu"  writes:
>>> On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu  wrote:
 On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu  wrote:
> On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov
>  wrote:
>> On 12-10-29 12:21 PM, Richard Sandiford wrote:
>>>
>>> Vladimir Makarov  writes:

 H.J. in

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116

 reported an interesting address

 (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ])
   (const_int 2 [0x2]))
   (symbol_ref:SI ("glob_vol_int_arr") >>> 0x703c2720 glob_vol_int_arr>)) 0)
   (const_int 4294967295 [0x]))

 which can not be correctly extracted.  Here `and' with `subreg'
 behaves as an address mutation.

 The following patch fixes the problem.

 Ok to commit, Richard?
>>>
>>> Heh, I wondered if subregs might still be used like that, and was almost
>>> tempted to add them just in case.
>>>
>>> I think this particular case is really a failed canonicalisation and 
>>> that:
>>>
>>> (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x))
>>>
>>> ought to be:
>>>
>>> (zero_extend:DI (foo:SI ...))
>>
>> Yes, that was my thought too.
>>
>>> But I know I've approved MIPS patches to accept
>>> (and:DI ... (const_int 0x)) as an alternative.
>>>
 Index: rtlanal.c
 ===
 --- rtlanal.c   (revision 192942)
 +++ rtlanal.c   (working copy)
 @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum
  else if (code == AND && CONST_INT_P (XEXP (*loc, 1)))
   /* (and ... (const_int -X)) is used to align to X bytes.  */
   loc = &XEXP (*loc, 0);
 +  else if (code == SUBREG
 +  && ! REG_P (XEXP (*loc, 0)) && ! MEM_P (XEXP (*loc, 0)))
 +   /* (subreg (operator ...) ...) usually inside and is used for
 +  mode conversion too.  */
 +   loc = &XEXP (*loc, 0);
>>>
>>> I think the condition should be:
>>>
>>>else if (code == SUBREG
>>> && !OBJECT_P (SUBREG_REG (*loc))
>>> && subreg_lowpart (*loc))
>>>
>>> OK with that change, if it works.
>>>
>> Yes, it works.
>> I've submitted the following patch.
>>
>
> It doesn't work right.  I will create a new testcase.
>


>>>
>>> This patch limits SUBREG to Pmode.  Tested on Linux/x86-64.
>>> OK to install?
>>>
>>> Thanks.
>>
>> The address in this case is:
>>
>> (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
>> (const_int 8 [0x8]))
>> (subreg:SI (plus:DI (reg/f:DI 20 frame)
>> (const_int 32 [0x20])) 0))
>>
>> which after Uros's subreg simplification patch shouldn't be allowed:
>> the subreg ought to be on the frame register rather than the plus.
>>
>> The attached patch seems to work for the testcase.  Does it work
>> more generally?
>>
>> Richard
>>
>>
>> gcc/
>> * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
>> rather than gen_rtx_SUBREG.
>>
>> Index: gcc/lra-eliminations.c
>> ===
>> --- gcc/lra-eliminations.c  (revision 192983)
>> +++ gcc/lra-eliminations.c  (working copy)
>> @@ -550,7 +550,8 @@
>>   return x;
>> }
>>   else
>> -   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
>> +   return simplify_gen_subreg (GET_MODE (x), new_rtx,
>> +   GET_MODE (new_rtx), SUBREG_BYTE (x));
>> }
>>
>>return x;
>
> I am running the full test.
>

It works.  Can you check in your patch?  I will check in
my testcase.

Thanks.


-- 
H.J.


Re: RFA: patch to fix PR55116

2012-10-30 Thread Richard Sandiford
"H.J. Lu"  writes:
> On Tue, Oct 30, 2012 at 4:38 AM, H.J. Lu  wrote:
>> On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford
>>  wrote:
>>> The address in this case is:
>>>
>>> (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
>>> (const_int 8 [0x8]))
>>> (subreg:SI (plus:DI (reg/f:DI 20 frame)
>>> (const_int 32 [0x20])) 0))
>>>
>>> which after Uros's subreg simplification patch shouldn't be allowed:
>>> the subreg ought to be on the frame register rather than the plus.
>>>
>>> The attached patch seems to work for the testcase.  Does it work
>>> more generally?
>>>
>>> Richard
>>>
>>>
>>> gcc/
>>> * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
>>> rather than gen_rtx_SUBREG.
>>>
>>> Index: gcc/lra-eliminations.c
>>> ===
>>> --- gcc/lra-eliminations.c  (revision 192983)
>>> +++ gcc/lra-eliminations.c  (working copy)
>>> @@ -550,7 +550,8 @@
>>>   return x;
>>> }
>>>   else
>>> -   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
>>> +   return simplify_gen_subreg (GET_MODE (x), new_rtx,
>>> +   GET_MODE (new_rtx), SUBREG_BYTE 
>>> (x));
>>> }
>>>
>>>return x;
>>
>> I am running the full test.
>>
>
> It works.  Can you check in your patch?  I will check in
> my testcase.

Thanks.  Vlad, is the patch OK?

Richard


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-10-30 Thread Diego Novillo
On Mon, Oct 29, 2012 at 1:56 PM, Jakub Jelinek  wrote:
> Status
> ==
>
> I'd like to close the stage 1 phase of GCC 4.8 development
> on Monday, November 5th.  If you have still patches for new features you'd
> like to see in GCC 4.8, please post them for review soon.  Patches
> posted before the freeze, but reviewed shortly after the freeze, may
> still go in, further changes should be just bugfixes and documentation
> fixes.

I will be committing the VEC overhaul soon.  With any luck this week,
but PCH and gengtype are giving me a lot of grief.


Diego.


Re: [Patch] Potential fix for PR55033

2012-10-30 Thread Sebastian Huber

On 10/26/2012 02:22 PM, Sebastian Huber wrote:

Hello,

here is a test case for PR55033.



Is there something wrong with this test case?  It compiles well with Alan's 
patch.

--
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax : +49 89 18 90 80 79-9
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.




Re: [Patch] Potential fix for PR55033

2012-10-30 Thread Sebastian Huber

Hello,

what needs to be done to get this committed?

--
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax : +49 89 18 90 80 79-9
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.




Re: [PATCH] Add gimple load/store predicates, use them from stmt estimates

2012-10-30 Thread Richard Biener
On Tue, 30 Oct 2012, Richard Biener wrote:

> 
> As requested this adds predicates to check whether the lhs of
> a assign or call is a store and whether rhs1 of an assignment
> is a load.  It uses this in place of the existing, slightly
> bogus, check in the stmt estimate code.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Committed with the following adjustment as we now would inline foo
as we no longer account memory move costs for stmts like

  pInput_9 = &MEM[(void *)pInput_1 + 8B];

adding noinline looks reasonable anyway (we'll fix the above
to cost something again with a followup).

Richard.

Index: gcc/testsuite/gcc.dg/vect/slp-perm-2.c
===
*** gcc/testsuite/gcc.dg/vect/slp-perm-2.c  (revision 192984)
--- gcc/testsuite/gcc.dg/vect/slp-perm-2.c  (working copy)
***
*** 12,18 

  #define N 16

! void foo (unsigned int *__restrict__ pInput, unsigned int *__restrict__ 
pOutput)
  {
unsigned int i, a, b;

--- 12,19 

  #define N 16

! void __attribute__((noinline))
! foo (unsigned int *__restrict__ pInput, unsigned int *__restrict__ 
pOutput)
  {
unsigned int i, a, b;




Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
> And tree expressions don't have TREE_BLOCK before gimple-low either.
> So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
> stmts as well as all expression in the operands.  It is not overwriting
> anything, no frontend sets TREE_BLOCK for any expression, the way frontends
> associate IL with BLOCKs is by putting them inside of BIND_EXPR/GIMPLE_BIND
> after gimplification, and it is gimple-low responsibility to set it.
>
> In 4.3 before tuples, it was solely gimple-low that set TREE_BLOCK
> initially.  Before the location_t changes, again it was gimple-low that
> was the first setter of TREE_BLOCK, which was valid for all
> IS_EXPR_CODE_CLASS.  So, IMNSHO gimple-low should merge location_t
> with block for all gimple stmts and all tree expressions used in its
> operands.  It shouldn't be set on trees that can be shared, so
> say decls etc. should keep using just location_t's without associated block.
> So perhaps the right test for gimple-low setting of block is
> CAN_HAVE_LOCATION_P (exp) && !tree_node_can_be_shared (exp).
>
> Jakub

I Kind of like this idea. What do you guys think?

Thanks,
Dehao


Re: RFA: patch to fix PR55116

2012-10-30 Thread Vladimir Makarov

On 10/30/2012 09:39 AM, Richard Sandiford wrote:

"H.J. Lu"  writes:

On Tue, Oct 30, 2012 at 4:38 AM, H.J. Lu  wrote:

On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford
 wrote:

The address in this case is:

(plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154])
 (const_int 8 [0x8]))
 (subreg:SI (plus:DI (reg/f:DI 20 frame)
 (const_int 32 [0x20])) 0))

which after Uros's subreg simplification patch shouldn't be allowed:
the subreg ought to be on the frame register rather than the plus.

The attached patch seems to work for the testcase.  Does it work
more generally?

Richard


gcc/
 * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg
 rather than gen_rtx_SUBREG.

Index: gcc/lra-eliminations.c
===
--- gcc/lra-eliminations.c  (revision 192983)
+++ gcc/lra-eliminations.c  (working copy)
@@ -550,7 +550,8 @@
   return x;
 }
   else
-   return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
+   return simplify_gen_subreg (GET_MODE (x), new_rtx,
+   GET_MODE (new_rtx), SUBREG_BYTE (x));
 }

return x;

I am running the full test.


It works.  Can you check in your patch?  I will check in
my testcase.

Thanks.  Vlad, is the patch OK?


Sure.  Thanks.



Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 3:17 PM, Dehao Chen  wrote:
>> And tree expressions don't have TREE_BLOCK before gimple-low either.
>> So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
>> stmts as well as all expression in the operands.  It is not overwriting
>> anything, no frontend sets TREE_BLOCK for any expression, the way frontends
>> associate IL with BLOCKs is by putting them inside of BIND_EXPR/GIMPLE_BIND
>> after gimplification, and it is gimple-low responsibility to set it.
>>
>> In 4.3 before tuples, it was solely gimple-low that set TREE_BLOCK
>> initially.  Before the location_t changes, again it was gimple-low that
>> was the first setter of TREE_BLOCK, which was valid for all
>> IS_EXPR_CODE_CLASS.  So, IMNSHO gimple-low should merge location_t
>> with block for all gimple stmts and all tree expressions used in its
>> operands.  It shouldn't be set on trees that can be shared, so
>> say decls etc. should keep using just location_t's without associated block.
>> So perhaps the right test for gimple-low setting of block is
>> CAN_HAVE_LOCATION_P (exp) && !tree_node_can_be_shared (exp).
>>
>> Jakub
>
> I Kind of like this idea. What do you guys think?

I question the need of BLOCK info on expression trees.  If BLOCKs are
relevant then the tree ends up referencing a declaration with a BLOCK as
context, no?  Thus, the case

  int tem, a;
  {
int a;
...
tem = a;
  }
  int b = tem + 5;

where we may end up with gimple like

  b = a + 5;

thus mixing two BLOCKs inside a stmt (and no expression trees to
attach different BLOCKs) is no different from the case where we end
up with expression trees.

Thus my original question - why isn't a NULL BLOCK treated the same
as UNKNOWN_LOCATION?  Or rather, _why_ does Michas patch not work?
Did you analyze the guality fails?

Thanks,
Richard.

> Thanks,
> Dehao


Re: [PATCH] Replace const_vector with match_operand in sse.md

2012-10-30 Thread Andrey Turetskiy
I changed the patch according Uros' remarks. Please, have a look.

Changelog:

2012-10-30  Andrey Turetskiy  

   * config/i386/i386.c (bdesc_args): Rename CODE_FOR_avx2_umulhrswv16hi3 to
   CODE_FOR_avx2_pmulhrswv16hi3.
   * config/i386/predicates.md (const1_operand): Extend for vectors.
   * config/i386/sse.md (ssse3_avx2): Extend.
   (ssedoublemode): Ditto.
   (_uavg3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3,
   avx2_uavgv16hi3 and sse2_uavgv8hi3 into one.
   (*_uavg3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3,
   *avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one.
   (PMULHRSW): New.
   (_pmulhrsw3): Merge avx2_umulhrswv16hi3,
   ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one.
   (*avx2_pmulhrswv16hi3): Replace const_vector with match_operand.
   (*ssse3_pmulhrswv8hi3): Ditto.
   (*ssse3_pmulhrswv4hi3): Ditto.

---
Best regards,
Andrey Turetskiy

On Wed, Oct 24, 2012 at 5:36 PM, Uros Bizjak  wrote:
> On Wed, Oct 24, 2012 at 3:01 PM, Andrey Turetskiy
>  wrote:
>
>> On Tue, Oct 23, 2012 at 2:45 PM, Andrey Turetskiy
>>  wrote:
>>> Hi,
>>>
>>> This patch replaces large const_vector constructions with
>>> match_operand in sse.md to decrease its size.
>>> Is it ok?
>
> No, you don't have to touch generic expand machinery.
>
> In the expander, use (match_dup X), and declare "operands[X] =
> CONST1_RTX (..some_mode..)" in preparation statement. In unnamed
> patterns, use const1_operand operand predicate. You should extend
> existing const1_operand in the same way as const0_operand.
>
> This approach is not compatible with named insn patterns, which
> duplicate its functionality as pattern matcher and as an expander.
>
> Uros.


const_vector_replace.patch
Description: Binary data


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Michael Matz
Hi,

On Tue, 30 Oct 2012, Richard Biener wrote:

> On Tue, Oct 30, 2012 at 3:17 PM, Dehao Chen  wrote:
> >> And tree expressions don't have TREE_BLOCK before gimple-low either.
> >> So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
> >> stmts as well as all expression in the operands.

That would be a step away from the ideal situation, so we rather should 
first analyze why the testcase fails with my patch.  I expected some 
fallout and am actually surprised it's only one testcase :)

What we should end up with in the ideal world is that we simply have no 
expressions in gimple (and hence no place to store any locations for 
them), except via gimple statements.

> I question the need of BLOCK info on expression trees.  If BLOCKs are 
> relevant then the tree ends up referencing a declaration with a BLOCK as 
> context, no?  Thus, the case
> 
>   int tem, a;
>   {
> int a;
> ...
> tem = a;
>   }
>   int b = tem + 5;
> 
> where we may end up with gimple like
> 
>   b = a + 5;
> 
> thus mixing two BLOCKs inside a stmt (and no expression trees to
> attach different BLOCKs) is no different from the case where we end
> up with expression trees.
> 
> Thus my original question - why isn't a NULL BLOCK treated the same
> as UNKNOWN_LOCATION?

Since merging location and block a null BLOCK doesn't imply no location.  
It can very well have a location without a block.  What we might want to 
imply is that a null BLOCK implies the BLOCK from the statement.  But as 
you say, first we should find out why my patch breaks the one testcase.

> Or rather, _why_ does Michas patch not work? Did you analyze the guality 
> fails?


Ciao,
Michael.


Re: [PR54693] loss of debug info in jump threading and loop ivopts

2012-10-30 Thread Richard Biener
On Fri, Oct 26, 2012 at 8:30 AM, Alexandre Oliva  wrote:
> Both jump threading and loop induction variable optimizations were
> dropping useful debug information, and it took improvements in both for
> debug info about relevant variables in the enclosed testcase to survive
> all the way to the end.
>
> The first problem was that jump threading could bypass blocks containing
> debug stmts, losing the required bindings.  The solution was to
> propagate bypassed debug insns into the target of the bypass.  Even if
> the new confluence ends up introducing new PHI nodes, the propagated
> debug binds will resolve to them, just as intended.  This is implemented
> in the 4th patch below: vta-jump-thread-prop-debug-pr54693.patch
>
> The second part of the problem was that, when performing loop ivopts,
> we'd often drop PHI nodes and other SSA names for unused ivs.  Although
> we had code to propagate SSA DEFs into debug uses upon their removal,
> this couldn't take care of PHI nodes (no debug phis or conditional debug
> binds), and it was precisely at the removal of a PHI node that we
> dropped debug info for the loop in the provided testcase.  Once Jakub
> figured out how to compute an unused iv out of available ivs (thanks!),
> it was easy to introduce debug temps with the expressions to compute
> them, so that debug binds would remain correct as long as the unused iv
> can still be computed out of the others.  (If it can't, we'll still try
> propagation, but may end up losing at the PHI nodes).  I had thought
> that replacing only the PHI nodes would suffice, but it turned out that
> replacing all unused iv defs with their equivalent used-IV expressions
> got us better coverage, so this is what the 5th patch below does:
> vta-ivopts-replace-dropped-phis-pr54693.patch

+ if (count > 1)
+   {
+ tree vexpr = make_node (DEBUG_EXPR_DECL);
+ DECL_ARTIFICIAL (vexpr) = 1;
+ TREE_TYPE (vexpr) = TREE_TYPE (comp);
+ if (SSA_NAME_VAR (def))
+   DECL_MODE (vexpr) = DECL_MODE (SSA_NAME_VAR (def));
+ else
+   DECL_MODE (vexpr) = TYPE_MODE (TREE_TYPE (vexpr));

simply always use the TREE_TYPE path.  TYPE_MODE is always valid
for SSA_NAMEs.

+ FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def)
+   {
+ if (!gimple_debug_bind_p (stmt))
+   continue;
+
+ FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+   SET_USE (use_p, comp);
+
+ if (!comp)
+   BREAK_FROM_IMM_USE_STMT (imm_iter);

how does comp magically become NULL_TREE here?

Btw, what's all the fuzz with IV candidates, etc?  At least for non-PHIs
I don't see why the regular release_ssa_name way of doing things does
not work.  IVOPTs is slow enough ...

Richard.


>
> Regression testing revealed -fcompare-debug regressions exposed by these
> patches.
>
> x86-specific code introduces pre-reload scheduling dependencies between
> calls and likely-spilled parameter registers, but it does so in a way
> that's AFAICT buggy, and fragile to the presence of debug insns at the
> top of a block: we won't introduce a dependency for the first insn of
> the block, even if we'd rather have such a dependency.  This fails to
> achieve the intended effect, and it also causes codegen differences when
> the first insn in the block happens to be a debug insn, for then we will
> add the intended dependency.  The first patch below,
> vta-stabilize-i386-call-args-sched-pr54693.patch, skips leading debug
> insns, so as to remove the difference, and moves the end of the backward
> scan to the insn before the first actual insn, so that we don't refrain
> from considering it for dependencies.  This in turn required an
> additional test to make sure we wouldn't go past the nondebug head if
> first_arg happened to be the head.
>
> The introduction of debug insns at new spots also exposed a bug in loop
> unrolling: we'd unroll a loop a different number of times depending on
> whether or not its latch is an empty block.  The propagation or
> introduction of debug insns in previously-empty latch blocks caused
> loops to be unrolled a different number of times depending on the
> presence of the debug insns, which triggered -fcompare-debug errors.
> The fix was to count only nondebug insns towards the decision on whether
> the latch block was empty.  This is implemented in the second patch
> below: vta-stabilize-loop-unroll-empty-latch-check-pr54693.patch
>
> Guality testsuite regressions given the patches above revealed that the
> fast DCE global dead debug substitution introduced for PR54551 was not
> correct: it was possible for us to visit, for the last time, a block
> with a REG used in debug stmts after its death before we visited the
> block with the debug use.  As a result, we'd fail to emit a debug temp
> at the not-revisited block, and the debug temp bindings intr

Non-dominating loop bounds in tree-ssa-loop-niter 1/4

2012-10-30 Thread Jan Hubicka
Hi,
this is first patch of change of tree-ssa-loop-niter to consider bounds that are
not in block dominating latch.  This patch makes them to be recorded and they
are not used.  I plan to followup with:

1) patch to add simple shortest path walk at the end of 
estimate_numbers_of_iterations_loop
   determining bound of i.e.
   int a[10];
   int b[10];
   for (i=0;ilatch, exit->src))
+  if (every_iteration
+  && !dominated_by_p (CDI_DOMINATORS, loop->latch, exit->src))
 return false;
 
   niter->assumptions = boolean_false_node;
@@ -2568,6 +2572,11 @@ record_estimate (struct loop *loop, tree
   loop->bounds = elt;
 }
 
+  /* If statement is executed on every path to the loop latch, we can directly
+ infer the upper bound on the # of iterations of the loop.  */
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (at_stmt)))
+return;
+
   /* Update the number of iteration estimates according to the bound.
  If at_stmt is an exit then the loop latch is executed at most BOUND times,
  otherwise it can be executed BOUND + 1 times.  We will lower the estimate
@@ -2651,7 +2660,6 @@ struct ilb_data
 {
   struct loop *loop;
   gimple stmt;
-  bool reliable;
 };
 
 static bool
@@ -2660,7 +2668,7 @@ idx_infer_loop_bounds (tree base, tree *
   struct ilb_data *data = (struct ilb_data *) dta;
   tree ev, init, step;
   tree low, high, type, next;
-  bool sign, upper = data->reliable, at_end = false;
+  bool sign, upper = true, at_end = false;
   struct loop *loop = data->loop;
 
   if (TREE_CODE (base) != ARRAY_REF)
@@ -2737,14 +2745,12 @@ idx_infer_loop_bounds (tree base, tree *
STMT is guaranteed to be executed in every iteration of LOOP.*/
 
 static void
-infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref,
-   bool reliable)
+infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref)
 {
   struct ilb_data data;
 
   data.loop = loop;
   data.stmt = stmt;
-  data.reliable = reliable;
   for_each_index (&ref, idx_infer_loop_bounds, &data);
 }
 
@@ -2753,7 +2759,7 @@ infer_loop_bounds_from_ref (struct loop 
executed in every iteration of LOOP.  */
 
 static void
-infer_loop_bounds_from_array (struct loop *loop, gimple stmt, bool reliable)
+infer_loop_bounds_from_array (struct loop *loop, gimple stmt)
 {
   if (is_gimple_assign (stmt))
 {
@@ -2763,10 +2769,10 @@ infer_loop_bounds_from_array (struct loo
   /* For each memory access, analyze its access function
 and record a bound on the loop iteration domain.  */
   if (REFERENCE_CLASS_P (op0))
-   infer_loop_bounds_from_ref (loop, stmt, op0, reliable);
+   infer_loop_bounds_from_ref (loop, stmt, op0);
 
   if (REFERENCE_CLASS_P (op1))
-   infer_loop_bounds_from_ref (loop, stmt, op1, reliable);
+   infer_loop_bounds_from_ref (loop, stmt, op1);
 }
   else if (is_gimple_call (stmt))
 {
@@ -2775,13 +2781,13 @@ infer_loop_bounds_from_array (struct loo
 
   lhs = gimple_call_lhs (stmt);
   if (lhs && REFERENCE_CLASS_P (lhs))
-   infer_loop_bounds_from_ref (loop, stmt, lhs, reliable);
+   infer_loop_bounds_from_ref (loop, stmt, lhs);
 
   for (i = 0; i < n; i++)
{
  arg = gimple_call_arg (stmt, i);
  if (REFERENCE_CLASS_P (arg))
-   infer_loop_bounds_from_ref (loop, stmt, arg, reliable);
+   infer_loop_bounds_from_ref (loop, stmt, arg);
}
 }
 }
@@ -2910,14 +2916,15 @@ infer_loop_bounds_from_undefined (struct
 
   /* If BB is not executed in each iteration of the loop, we cannot
 use the operations in it to infer reliable upper bound on the
-# of iterations of the loop.  However, we can use it as a guess.  */
+# of iterations of the loop.  However, we can use it as a guess. 
+Reliable guesses come only from array bounds.  */
   reliable = dominated_by_p (CDI_DOMINATORS, loop->latch, bb);
 
   for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
{
  gimple stmt = gsi_stmt (bsi);
 
- infer_loop_bounds_from_array (loop, stmt, reliable);
+ infer_loop_bounds_from_array (loop, stmt);
 
  if (reliable)
 {
@@ -3078,7 +3085,7 @@ estimate_numbers_of_iterations_loop (str
   likely_exit = single_likely_exit (loop);
   FOR_EACH_VEC_ELT (edge, exits, i, ex)
 {
-  if (!number_of_iterations_exit (loop, ex, &niter_desc, false))
+  if (!number_of_iterations_exit (loop, ex, &niter_desc, false, false))
continue;
 
   niter = niter_desc.niter;


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 3:49 PM, Michael Matz  wrote:
> Hi,
>
> On Tue, 30 Oct 2012, Richard Biener wrote:
>
>> On Tue, Oct 30, 2012 at 3:17 PM, Dehao Chen  wrote:
>> >> And tree expressions don't have TREE_BLOCK before gimple-low either.
>> >> So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple
>> >> stmts as well as all expression in the operands.
>
> That would be a step away from the ideal situation, so we rather should
> first analyze why the testcase fails with my patch.  I expected some
> fallout and am actually surprised it's only one testcase :)
>
> What we should end up with in the ideal world is that we simply have no
> expressions in gimple (and hence no place to store any locations for
> them), except via gimple statements.
>
>> I question the need of BLOCK info on expression trees.  If BLOCKs are
>> relevant then the tree ends up referencing a declaration with a BLOCK as
>> context, no?  Thus, the case
>>
>>   int tem, a;
>>   {
>> int a;
>> ...
>> tem = a;
>>   }
>>   int b = tem + 5;
>>
>> where we may end up with gimple like
>>
>>   b = a + 5;
>>
>> thus mixing two BLOCKs inside a stmt (and no expression trees to
>> attach different BLOCKs) is no different from the case where we end
>> up with expression trees.
>>
>> Thus my original question - why isn't a NULL BLOCK treated the same
>> as UNKNOWN_LOCATION?
>
> Since merging location and block a null BLOCK doesn't imply no location.
> It can very well have a location without a block.  What we might want to
> imply is that a null BLOCK implies the BLOCK from the statement.  But as
> you say, first we should find out why my patch breaks the one testcase.

Yes, I mean we happily leave the stmt line location the same if we have
a stmt with UNKNOWN_LOCATION (thus it inherits that of the previous stmt),
we should do the same with BLOCKs - if a stmt has a location with NULL
BLOCK then it should inherit the block info from the previous stmt.

Richard.

>> Or rather, _why_ does Michas patch not work? Did you analyze the guality
>> fails?
>
>
> Ciao,
> Michael.


Re: [PATCH] Inter-bb range test optimization (PRs tree-optimization/19105, tree-optimization/21643, tree-optimization/46309)

2012-10-30 Thread Richard Biener
On Fri, Oct 26, 2012 at 7:27 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch extends optimize_range_tests optimization, so that it
> handles also the cases where the truth && or || has been gimplifed
> as a series of GIMPLE_CONDs or mixture thereof and BIT_{AND,IOR}_EXPR
> stmts.
>
> Example of code it handles is e.g.:
>   :
>   v1_3 = a_2(D) != 3;
>   v2_4 = a_2(D) != 1;
>   v3_5 = a_2(D) != 4;
>   v4_6 = a_2(D) != 2;
>   v5_7 = a_2(D) != 7;
>   v6_8 = a_2(D) != 5;
>   v7_9 = a_2(D) != 8;
>   v8_10 = a_2(D) != 6;
>   _11 = v1_3 & v2_4;
>   if (_11 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   _13 = v3_5 & v4_6;
>   if (_13 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   _14 = v5_7 & v6_8;
>   if (_14 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   _15 = v7_9 & v8_10;
>   if (_15 != 0)
> goto ;
>   else
> goto ;
>
> or:
>
>   :
>   _3 = c_2(D) == 34;
>   _4 = c_2(D) == 32;
>   _5 = _3 | _4;
>   if (_5 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   _8 = c_2(D) <= 31;
>   _7 = (int) _8;
>
>   :
>   # _1 = PHI <_7(3), 1(2)>
>
> It is implemented in reassociate_bb, as that is where all the
> infrastructure already is, but isn't done as part of normal
> reassociation, but before reassociating first bb that represents
> a series of && or || operands.  As post-dominator sons aren't
> particularly ordered, it can happen that the optimization is performed
> on the first, last or middle bb of the series of bbs, so it searches
> both forward and backward to find suitable basic blocks.
> The last bb in the series (last_bb) can be of the form of bb 3 in the
> second example, which is how certain sequencies are gimplified when
> assigning && or || result to a variable or returning it.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok, but the code could really really have some more comments - functions
not fitting in my 80x24 terminal without seeing any comment what happens
here are a maintainance nightmare.

Thanks,
Richard.

> 2012-10-26  Jakub Jelinek  
>
> PR tree-optimization/19105
> PR tree-optimization/21643
> PR tree-optimization/46309
> * tree-ssa-reassoc.c (init_range_entry): Add STMT argument
> and use it if EXP is NULL.
> (update_range_test): Handle OPCODE equal to ERROR_MARK
> and oe->op NULL.
> (optimize_range_tests): Likewise.
> (final_range_test_p, suitable_cond_bb, no_side_effect_bb, get_ops,
> maybe_optimize_range_tests): New functions.
> (reassociate_bb): Call maybe_optimize_range_tests if last
> stmt of bb is GIMPLE_COND that hasn't been visited yet.
>
> * gcc.dg/pr19105.c: New test.
> * gcc.dg/pr21643.c: New test.
> * gcc.dg/pr46309-2.c: New test.
> * gcc.c-torture/execute/pr46309.c: New test.
>
> --- gcc/tree-ssa-reassoc.c.jj   2012-10-25 09:21:08.657049321 +0200
> +++ gcc/tree-ssa-reassoc.c  2012-10-26 15:51:13.398025229 +0200
> @@ -1,5 +1,5 @@
>  /* Reassociation for trees.
> -   Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011
> +   Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011, 2012
> Free Software Foundation, Inc.
> Contributed by Daniel Berlin 
>
> @@ -1713,10 +1713,12 @@ struct range_entry
>  };
>
>  /* This is similar to make_range in fold-const.c, but on top of
> -   GIMPLE instead of trees.  */
> +   GIMPLE instead of trees.  If EXP is non-NULL, it should be
> +   an SSA_NAME and STMT argument is ignored, otherwise STMT
> +   argument should be a GIMPLE_COND.  */
>
>  static void
> -init_range_entry (struct range_entry *r, tree exp)
> +init_range_entry (struct range_entry *r, tree exp, gimple stmt)
>  {
>int in_p;
>tree low, high;
> @@ -1727,7 +1729,8 @@ init_range_entry (struct range_entry *r,
>r->strict_overflow_p = false;
>r->low = NULL_TREE;
>r->high = NULL_TREE;
> -  if (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp)))
> +  if (exp != NULL_TREE
> +  && (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp
>  return;
>
>/* Start with simply saying "EXP != 0" and then look at the code of EXP
> @@ -1735,12 +1738,14 @@ init_range_entry (struct range_entry *r,
>   happen, but it doesn't seem worth worrying about this.  We "continue"
>   the outer loop when we've changed something; otherwise we "break"
>   the switch, which will "break" the while.  */
> -  low = build_int_cst (TREE_TYPE (exp), 0);
> +  low = exp ? build_int_cst (TREE_TYPE (exp), 0) : boolean_false_node;
>high = low;
>in_p = 0;
>strict_overflow_p = false;
>is_bool = false;
> -  if (TYPE_PRECISION (TREE_TYPE (exp)) == 1)
> +  if (exp == NULL_TREE)
> +is_bool = true;
> +  else if (TYPE_PRECISION (TREE_TYPE (exp)) == 1)
>  {
>if (TYPE_UNSIGNED (TREE_TYPE (exp)))
> is_bool = true;
> @@ -1752,25 +1757,35 @@ init_range_entry (struct range_entry *r,
>
>while (1)
>  {
> -  gimple stmt;
>enum tree_code

Re: RFA: hookize ADJUST_INSN_LENGTH (Was: RFA: Add lock_lenth attribute to support the ARC port)

2012-10-30 Thread Richard Biener
On Sun, Oct 28, 2012 at 6:13 PM, Joern Rennecke
 wrote:
> Quoting Richard Biener :
>
>>> Thus, you can allow the length to vary downwards as well as upwards
>>> across iterations with suitable definitions of the @code{length}
>>> attribute
>>> and/or @code{ADJUST_INSN_LENGTH}.  Care has to be taken that this does
>>> not
>>> lead to infinite loops.
>>
>>
>> I don't see that you can shrink length with just suitable lock_length and
>> length attributes.
>
>
> I disagree there (for certain values of 'just'), but we can just agree
> to disagree on this point because...
>
>>  What seems to be the cruical difference is that you
>> apply ADJUST_INSN_LENGTH after combining lock-length-max and length.
>> But then you _decrease_ length with ADJUST_INSN_LENGHT ...
>>
>> Maybe what you really want is ADJUST_INSN_LENGTH_AFTER which is
>> applied afterwards?  Thus,
>
>
> Well, actually, I found a number of problems with ADJUST_INSN_LENGTH:
> - it is not applied to inner insn is a delay slot sequence.  You can
>   sort of work around this by stitching recursive calls together, but
>   then you are faced with another problem:
> - You don't know what the length prior to ADJUST_INSN_LENGTH was.  That
>   was even worse with the non-unique uids where you didn't knew squat
>   about the instructions in the delay slot.
> - As you said, I want the adjustment happen after the maximum calculation.
>   Well, usually.  There are actually special cases where it would be useful
>   to have some sort of maximum calculation in place, to break up
>   alignment-driven cycles, but only applicable with a lesser priority than
>   for the range-driven branch offset calculations.
>
> But adding yet another macro neither does solve all these problems, nor
> would it align with our goal to move away from target macros.
>
> I found now an alternate way to make the ARC port termiante building its
> insn-attrtab.c - it involves using match_test("get_attr_length (insn) == 2")
> instead of eq_attr("[lock_]length" (const_int 2)) - where I really want to
> know if the instruction was considered short in the previous iteration.
>
> So, I have made a patch to replace the ADJUST_INSN_LENGTH macro in final.c
> with an adjust_insn_length hook, for which a default implementation
> using ADJUST_INSN_LENGTH is provided in targhooks.c to provide for an
> easy transition.
> I've looked at the existing ports that use ADJUST_INSN_LENGTH, and some
> seem to prefer to return an adjustment to be added to the length, while
> others prefer to return the new length.  The latter seemed to be slightly
> more numerous, so I went with that.
>
> The hook has two new parameters:
> - a flag that tells it if the insn in question is a delay sequence.
>   The default hook implementation skips the invocation of ADJUST_INSN_LENGTH
>   in this case for the sake of compatibility.
> - a pointer to int to set the number of iteration loops till the length
>   locking feature is supposed to apply to this instruction length when
>   using the increasing size calculations.  The pointed-to value is
> initialized
>   to zero, which means that length locking is always applied (assuming
> final.c
>   uses the increasing length algorithm).  Setting this to a higher number
>   effectively gives the new instruction length a lower priority to be put
>   into uid_lock_length.
>
>> Note that
>>
>>> Care has to be taken that this does not
>>> lead to infinite loops.
>>
>>
>> doesn't convince me that is properly designed ;)
>
>
> With the hook mechanism, it is much harder to create an infinite loop
> inside shorten_branches.  (It would involve something like setting
>  iteration_threshold to MAX_INT and making length decreasing when niter
>  is at MAX_INT, then letting integer overflow of niter take its course.
>  Making it impossible for a port maintainer to get things wrong is not a
>  meaningful goal for GCC, but making it straightforward to get it right is.)
>
> This patch builds on:
> http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html
>
> Bootstrapped in revision 192879 on i686-pc-linux-gnu.
> Tested with config-list.mk on x86_64-unknown-linux-gnu.

Apart from the iteration_threshold the hookization would be straight-forward.
Now I cannot decipher from the patch what functional change it introduces ;)

There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to
warrant introducing a hook without removing the macro IMHO.

Thanks,
Richard.

> 2012-10-28  Joern Rennecke  
>
> * doc/tm.texi.in (@hook TARGET_ADJUST_INSN_LENGTH): Add.
> * doc/tm.texi: Regenerate.
> * final.c (get_attr_length_1): Use targetm.adjust_insn_length
> instead of ADJUST_INSN_LENGTH.
> (adjust_length): New function.
> (shorten_branches): Use adjust_length instead of ADJUST_INSN_LENGTH.
> * target.def (adjust_insn_length): New hook.
> * targhooks.c (default_adjust_insn_length): New function.
> * targhooks.h (default_adjust_insn_length): Declare.
>

Re: Non-dominating loop bounds in tree-ssa-loop-niter 1/4

2012-10-30 Thread Richard Biener
On Tue, 30 Oct 2012, Jan Hubicka wrote:

> Hi,
> this is first patch of change of tree-ssa-loop-niter to consider bounds that 
> are
> not in block dominating latch.  This patch makes them to be recorded and they
> are not used.  I plan to followup with:
> 
> 1) patch to add simple shortest path walk at the end of 
> estimate_numbers_of_iterations_loop
>determining bound of i.e.
>int a[10];
>int b[10];
>for (i=0;i  if (t())
>q(a[i]);
>  else
>q(a[i]);
> 2) make complette loop unrolling to kill all statements known to not be 
> executed
>in the last iteration by __builtin_unreachable to silence part of 
> -Warray-bounds
>warnings currently breaking bootstrap with -O3
> 3) make duplicate_loop_to_header_edge in peeling mode to do the same silencing
>the rest of warnings
> 4) make branch prediction code to drop the prediction on exits not dominating
>latch.
> 
> Bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

>   * tree-ssa-loop-niter.c (number_of_iterations_exit): New parameter
>   EVERY_ITERATION with implicit value of true.
>   (record_estimate): Check dominance relationship of the basic block
>   we are estimating on instead of relying on UPPER to be false.
>   (struct ilb_data): Drop RELIABLE.
>   (idx_infer_loop_bounds): Update.
>   (infer_loop_bounds_from_ref): Drop parameter RELIABLE.
>   (infer_loop_bounds_from_array): Drop parameter RELIABLE.
>   (infer_loop_bounds_from_undefined): Update comments and handling
>   of RELIABLE.
>   (estimate_numbers_of_iterations_loop): Record all bounds.
> Index: tree-ssa-loop-niter.c
> ===
> --- tree-ssa-loop-niter.c (revision 192986)
> +++ tree-ssa-loop-niter.c (working copy)
> @@ -1793,12 +1793,15 @@ loop_only_exit_p (const struct loop *loo
> meaning described in comments at struct tree_niter_desc
> declaration), false otherwise.  If WARN is true and
> -Wunsafe-loop-optimizations was given, warn if the optimizer is going to 
> use
> -   potentially unsafe assumptions.  */
> +   potentially unsafe assumptions.  
> +   When EVERY_ITERATION is true, only tests that are known to be executed
> +   every iteration are considered (i.e. only test that alone bounds the 
> loop). 
> + */
>  
>  bool
>  number_of_iterations_exit (struct loop *loop, edge exit,
>  struct tree_niter_desc *niter,
> -bool warn)
> +bool warn, bool every_iteration)
>  {
>gimple stmt;
>tree type;
> @@ -1806,7 +1809,8 @@ number_of_iterations_exit (struct loop *
>enum tree_code code;
>affine_iv iv0, iv1;
>  
> -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, exit->src))
> +  if (every_iteration
> +  && !dominated_by_p (CDI_DOMINATORS, loop->latch, exit->src))
>  return false;
>  
>niter->assumptions = boolean_false_node;
> @@ -2568,6 +2572,11 @@ record_estimate (struct loop *loop, tree
>loop->bounds = elt;
>  }
>  
> +  /* If statement is executed on every path to the loop latch, we can 
> directly
> + infer the upper bound on the # of iterations of the loop.  */
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (at_stmt)))
> +return;
> +
>/* Update the number of iteration estimates according to the bound.
>   If at_stmt is an exit then the loop latch is executed at most BOUND 
> times,
>   otherwise it can be executed BOUND + 1 times.  We will lower the 
> estimate
> @@ -2651,7 +2660,6 @@ struct ilb_data
>  {
>struct loop *loop;
>gimple stmt;
> -  bool reliable;
>  };
>  
>  static bool
> @@ -2660,7 +2668,7 @@ idx_infer_loop_bounds (tree base, tree *
>struct ilb_data *data = (struct ilb_data *) dta;
>tree ev, init, step;
>tree low, high, type, next;
> -  bool sign, upper = data->reliable, at_end = false;
> +  bool sign, upper = true, at_end = false;
>struct loop *loop = data->loop;
>  
>if (TREE_CODE (base) != ARRAY_REF)
> @@ -2737,14 +2745,12 @@ idx_infer_loop_bounds (tree base, tree *
> STMT is guaranteed to be executed in every iteration of LOOP.*/
>  
>  static void
> -infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref,
> - bool reliable)
> +infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref)
>  {
>struct ilb_data data;
>  
>data.loop = loop;
>data.stmt = stmt;
> -  data.reliable = reliable;
>for_each_index (&ref, idx_infer_loop_bounds, &data);
>  }
>  
> @@ -2753,7 +2759,7 @@ infer_loop_bounds_from_ref (struct loop 
> executed in every iteration of LOOP.  */
>  
>  static void
> -infer_loop_bounds_from_array (struct loop *loop, gimple stmt, bool reliable)
> +infer_loop_bounds_from_array (struct loop *loop, gimple stmt)
>  {
>if (is_gimple_assign (stmt))
>  {
> @@ -2763,10 +2769,10 @@ infer_loop_bounds_from_array (struct loo
>   

Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 03:38:11PM +0100, Richard Biener wrote:
> I question the need of BLOCK info on expression trees.  If BLOCKs are
> relevant then the tree ends up referencing a declaration with a BLOCK as
> context, no?  Thus, the case
> 
>   int tem, a;
>   {
> int a;
> ...
> tem = a;
>   }
>   int b = tem + 5;
> 
> where we may end up with gimple like
> 
>   b = a + 5;
> 
> thus mixing two BLOCKs inside a stmt (and no expression trees to
> attach different BLOCKs) is no different from the case where we end
> up with expression trees.

IMHO either we don't use locations at all on tree expressions (thus
no source location nor block), or both.  A source location has always
an associated block it is present in.  Of course for shared trees we
can't put there any block, as it can appear anywhere.

> Thus my original question - why isn't a NULL BLOCK treated the same
> as UNKNOWN_LOCATION?  Or rather, _why_ does Michas patch not work?
> Did you analyze the guality fails?

Micha's patch is degrading debug info quality.  Whenever some expression
has different source location from the source location of the gimple stmt,
then assuming that other source location is from the same block is wrong,
it could very well be from a different one.  On the testcase that fails
with Micha's patch, we have:
  [pr43479.c : 8:4] l_2 = l_1(D) + 1;
  [pr43479.c : 8:4] # DEBUG l => l_2
  [pr43479.c : 10:9] # DEBUG h => n_3(D)
  [pr43479.c : 12:11] # DEBUG i => k_4(D)
  [pr43479.c : 13:8] k_5 = k_4(D) + 1;
  [pr43479.c : 13:8] # DEBUG k => k_5
  [pr43479.c : 17:11] # DEBUG j => m_6(D)
  [pr43479.c : 18:8] m_7 = m_6(D) + 1;
  [pr43479.c : 18:8] # DEBUG m => m_7
  [pr43479.c : 22:3] __asm__ __volatile__("" :  : "r" k_5, "r" l_2);
  [pr43479.c : 23:3] __asm__ __volatile__("" :  : "r" m_7, "r" n_3(D));
where line 8 is from the outer block in the source, line 10 from the middle
block, line 12/13 from first innermost block, line 17/18 from second
innermost block.  But all of the l_2, k_5 and m_7 setters are TERed,
so everything is emitted when expanding the two asm
statements and with Micha's patch the block used is the block of the asm
statement, while previously each TERed statement got its own block.

I'd say either we should do the TREE_BLOCK setting on all non-shareable
trees during gimple-low and clear the block (but then likely whole
location?; it doesn't make sense to say in the debugger that something
has certain source location when you can't print variables declared in that
location) if copying expressions for use elsewhere, outside of the
containing function.  Or say during gimplification or gimple-low.c
simply set t->exp.locus of all non-shareable expressions to UNKNOWN_LOCATION
to make it clear we don't use it (wonder if that could affect debug info
quality, perhaps not that much), but during expansion if creating trees
from TERed stmts they need to be set back, or the current location/block
adjusted accordingly.

Jakub


Re: [PATCH] PR c++/54955 - Fail to parse alignas expr at the beginning of a declaration

2012-10-30 Thread Jason Merrill

OK.

Jason


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 4:03 PM, Jakub Jelinek  wrote:
> On Tue, Oct 30, 2012 at 03:38:11PM +0100, Richard Biener wrote:
>> I question the need of BLOCK info on expression trees.  If BLOCKs are
>> relevant then the tree ends up referencing a declaration with a BLOCK as
>> context, no?  Thus, the case
>>
>>   int tem, a;
>>   {
>> int a;
>> ...
>> tem = a;
>>   }
>>   int b = tem + 5;
>>
>> where we may end up with gimple like
>>
>>   b = a + 5;
>>
>> thus mixing two BLOCKs inside a stmt (and no expression trees to
>> attach different BLOCKs) is no different from the case where we end
>> up with expression trees.
>
> IMHO either we don't use locations at all on tree expressions (thus
> no source location nor block), or both.  A source location has always
> an associated block it is present in.  Of course for shared trees we
> can't put there any block, as it can appear anywhere.
>
>> Thus my original question - why isn't a NULL BLOCK treated the same
>> as UNKNOWN_LOCATION?  Or rather, _why_ does Michas patch not work?
>> Did you analyze the guality fails?
>
> Micha's patch is degrading debug info quality.  Whenever some expression
> has different source location from the source location of the gimple stmt,
> then assuming that other source location is from the same block is wrong,
> it could very well be from a different one.  On the testcase that fails
> with Micha's patch, we have:
>   [pr43479.c : 8:4] l_2 = l_1(D) + 1;
>   [pr43479.c : 8:4] # DEBUG l => l_2
>   [pr43479.c : 10:9] # DEBUG h => n_3(D)
>   [pr43479.c : 12:11] # DEBUG i => k_4(D)
>   [pr43479.c : 13:8] k_5 = k_4(D) + 1;
>   [pr43479.c : 13:8] # DEBUG k => k_5
>   [pr43479.c : 17:11] # DEBUG j => m_6(D)
>   [pr43479.c : 18:8] m_7 = m_6(D) + 1;
>   [pr43479.c : 18:8] # DEBUG m => m_7
>   [pr43479.c : 22:3] __asm__ __volatile__("" :  : "r" k_5, "r" l_2);
>   [pr43479.c : 23:3] __asm__ __volatile__("" :  : "r" m_7, "r" n_3(D));
> where line 8 is from the outer block in the source, line 10 from the middle
> block, line 12/13 from first innermost block, line 17/18 from second
> innermost block.  But all of the l_2, k_5 and m_7 setters are TERed,
> so everything is emitted when expanding the two asm
> statements and with Micha's patch the block used is the block of the asm
> statement, while previously each TERed statement got its own block.
>
> I'd say either we should do the TREE_BLOCK setting on all non-shareable
> trees during gimple-low and clear the block (but then likely whole
> location?; it doesn't make sense to say in the debugger that something
> has certain source location when you can't print variables declared in that
> location) if copying expressions for use elsewhere, outside of the
> containing function.  Or say during gimplification or gimple-low.c
> simply set t->exp.locus of all non-shareable expressions to UNKNOWN_LOCATION
> to make it clear we don't use it (wonder if that could affect debug info
> quality, perhaps not that much), but during expansion if creating trees
> from TERed stmts they need to be set back, or the current location/block
> adjusted accordingly.

So maybe TER (well, those looking up the stmt) should pick the location
from the TERed statement properly then?

Richard.

> Jakub


Re: [PR54693] loss of debug info in jump threading and loop ivopts

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 03:51:19PM +0100, Richard Biener wrote:
> + FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def)
> +   {
> + if (!gimple_debug_bind_p (stmt))
> +   continue;
> +
> + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> +   SET_USE (use_p, comp);
> +
> + if (!comp)
> +   BREAK_FROM_IMM_USE_STMT (imm_iter);
> 
> how does comp magically become NULL_TREE here?

Looks like pasto to me, from the first loop.

BTW, I'd also think that the first loop should set count = 2 if
the debug stmt already has a non-trivial expression (i.e. rhs isn't
just the SSA_NAME), to force a debug temp in that case to avoid creating too
large debug stmts.

> Btw, what's all the fuzz with IV candidates, etc?  At least for non-PHIs
> I don't see why the regular release_ssa_name way of doing things does
> not work.  IVOPTs is slow enough ...

Even if it is non-PHI, release_ssa_name will replace it with the definition,
and then on another release_ssa_name again and again, and finally either
be lucky enough that some SSA_NAME stays (is an IV that has been kept), but
more often you'll just reach the PHI node and end up with a long list of:
  DEBUG D#7 => NULL
  DEBUG D#6 => D#7
  DEBUG D#5 => D#6
  DEBUG D#4 => D#5
  DEBUG D#3 => D#4
  DEBUG D#2 => D#3
  DEBUG D#1 => D#2
(the NULL because of PHI).  We don't need to find optimal IV replacement,
so the code tries just a couple of them (perhaps the 64 should be a param,
and perhaps could be lower by default), it just helps if the expression is
smaller (smaller debug info), and if it contains as few SSA_NAMEs as
possible (then it is more likely it will actually be useful).

Jakub


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 04:15:38PM +0100, Richard Biener wrote:
> So maybe TER (well, those looking up the stmt) should pick the location
> from the TERed statement properly then?

Perhaps, but Micha's patch doesn't do that.
But in that case IMHO it still would help to set all expr locations to
UNKNOWN_LOCATION during gimple lowering, to make it clear we ignore
the locations.

Jakub


Re: [PATCH] pass filtering for -fopt-info

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 9:21 AM, Sharad Singhai  wrote:
> As per discussion in http://gcc.gnu.org/ml/gcc/2012-10/msg00225.html,
> I have added the -fopt-info pass filtering in the attached patch.
>
> The basic idea is that there are optimization pass groups and a user
> can selectively enable dumps for these group(s) via command-line
> syntax. Currently, I define the following optimization groups: 'loop',
> 'lto', 'inline', 'vec', and 'omp'. A pass can be in multiple groups.
>
> If a pass doesn't explicitly specify an optimization group, (denoted
> by OPTGROUP_NONE) then a group is assigned based on the pass type.
> These three are the obvious implicit groups: 'tree', 'ipa', and 'rtl'.
>
> Also there is a catch-all group, called 'optall'.
>
> The options for -fopt-info dumps verbosity remain 'optimized',
> 'missed', 'note', and 'all'. Since these two types of options,
> verbosity and optimization groups are non-overlapping, I have decided
> to freely mix them. Something like this works as expected, i.e., dump
> missed vectorizer info into vec.missed.
>
> gcc ... -fopt-info-vec-missed=vec.missed
>
> which is equivalent to
>
> gcc ... -fopt-info-missed-vec=vec.missed
>
> However, the order still matters, and it can be somewhat confusing. For 
> example,
>
> gcc -fopt-info-vec-missed=vec.miss -fopt-info-vec-optimized=vec.opt
>
> will dump missed and optimized vectorizer info into vec.opt, while no
> vec.miss is produced. This is due to the fact that the latter group
> specifier, 'vec' overrides the first one. However, the 'missed' and
> 'optimized' are both honored as there is no conflict there. This is
> somewhat confusing. Hopefully, this type of usage would not be common.

What I'd expect from that would be both vec.miss and vec.opt being
populated ... (thus go away from the duality of dump files to
primary dump file plus a set of alternate dump files).

> I have updated the documentation to include -fopt-info examples, and
> added some details about -fopt-info command line conflicts.

I like it overall, not sure if we want to pre-populate the OPTGROUP
set too much at this point.  Like what is 'tree' or 'rtl' to users?
nothing I think.
'ipa' yes.  'lto'?  sounds redundant with 'ipa' to me.  'omp'?  we don't have
any optimizations here.

Thus please drop TREE, RTL, LTO and OMP for now.

Otherwise I'm leaving it for comments from other folks.

Thanks,
Richard.

> Thanks,
> Sharad


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Richard Biener
On Tue, Oct 30, 2012 at 4:21 PM, Jakub Jelinek  wrote:
> On Tue, Oct 30, 2012 at 04:15:38PM +0100, Richard Biener wrote:
>> So maybe TER (well, those looking up the stmt) should pick the location
>> from the TERed statement properly then?
>
> Perhaps, but Micha's patch doesn't do that.
> But in that case IMHO it still would help to set all expr locations to
> UNKNOWN_LOCATION during gimple lowering, to make it clear we ignore
> the locations.

Yes indeed.

Richard.

> Jakub


Re: PATCH: PR rtl-optimization/55093: [4.8 Regression] [x32] -maddress-mode=long failed

2012-10-30 Thread Vladimir Makarov

On 10/30/2012 06:34 AM, Richard Sandiford wrote:

"H.J. Lu"  writes:

LRA has

   if (REG_P (reg) && (ep = get_elimination (reg)) != NULL)
 {
   rtx to_rtx = replace_p ? ep->to_rtx : ep->from_rtx;

   if (! replace_p)
 {
   offset += (ep->offset - ep->previous_offset);
   offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));
 }

   if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
 to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx);

Reload has

 rtx to_rtx = ep->to_rtx;
 offset += ep->offset;
 offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));

 if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
   to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)),
 to_rtx);

(gdb) call debug_rtx (ep->to_rtx)
(reg/f:DI 7 sp)
(gdb) call debug_rtx (ep->from_rtx)
(reg/f:DI 16 argp)
(gdb)

gen_lowpart returns (reg/f:DI 7 sp) for reload and (reg:SI 16 argp)
for LRA.   They are caused by

   if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
   /* We should convert arg register in LRA after the elimination
  if it is possible.  */
   && xregno == ARG_POINTER_REGNUM
   && ! lra_in_progress)
 return -1;

It doesn't work in this case.


This testcase shows that LRA can't convert arg register after
the elimination.


Here is a patch to remove ra_in_progress check for
ARG_POINTER_REGNUM.  Tested on Linux.x86-64.
OK to install?

Thanks HJ.  This looks good to me.  As well as your testcase, I think
it would be dangerous to reduce this kind of subreg during non-final
elimination in cases where the argument pointer occupies more than one
hard register (like avr IIRC).  We could end up with something like
ARG_POINTER_REGNUM+1, which wouldn't show up as an elimination register
during the rest of LRA.

It's important that we do get rid of the subreg during the final
elimination stage, but I think alter_subreg already handles that case.

Since this code is outside the LRA files: patch is OK if Vlad agrees.


I added this code for a reason probably to solve some target problems.

So I am not sure but let us try.

It is ok for me to commit the patch if there are no regressions on 
x86/x86-64.





Re: [PATCH] Replace const_vector with match_operand in sse.md

2012-10-30 Thread Uros Bizjak
On Tue, Oct 30, 2012 at 3:47 PM, Andrey Turetskiy
 wrote:
> I changed the patch according Uros' remarks. Please, have a look.
>
> Changelog:
>
> 2012-10-30  Andrey Turetskiy  
>
>* config/i386/i386.c (bdesc_args): Rename CODE_FOR_avx2_umulhrswv16hi3 
> to
>CODE_FOR_avx2_pmulhrswv16hi3.
>* config/i386/predicates.md (const1_operand): Extend for vectors.
>* config/i386/sse.md (ssse3_avx2): Extend.
>(ssedoublemode): Ditto.
>(_uavg3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3,
>avx2_uavgv16hi3 and sse2_uavgv8hi3 into one.
>(*_uavg3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3,
>*avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one.
>(PMULHRSW): New.
>(_pmulhrsw3): Merge avx2_umulhrswv16hi3,
>ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one.
>(*avx2_pmulhrswv16hi3): Replace const_vector with match_operand.
>(*ssse3_pmulhrswv8hi3): Ditto.
>(*ssse3_pmulhrswv4hi3): Ditto.

+{
+  ix86_fixup_binary_operands_no_copy (PLUS, mode, operands);
+  operands[3] = CONST1_RTX(mode);
+})

Please put operands[3] initialization before the call to
ix86_f_b_o_n_c. We don't want to pass uninitialized operands around.

+{
+  if (which_alternative == 0
+  && (mode == V16QImode
+  || mode == V8HImode))
+return "pavg\t{%2, %0|%0, %2}";
+  return "vpavg\t{%2, %1, %0|%0, %1, %2}";
+}
+  [(set (attr "isa")
+ (if_then_else
+   (match_test "which_alternative == 0 && (mode ==
V16QImode || mode == V8HImode)")
+ (const_string "noavx")
+ (const_string "avx")))
+   (set (attr "prefix_data16")
+(if_then_else (eq_attr "isa" "noavx")
+  (const_string "1")
+  (const_string "*")))
+   (set (attr "prefix")
+(if_then_else (eq_attr "isa" "noavx")
+  (const_string "orig")
+  (const_string "vex")))

Uh, oh.

Please note that "isa" attribute enables or disables the alternative
through "enabled" attribute. Just change the "mode" attribute to
"" and everything will magically work:
- AVX2 implies AVX, so it enables alternative 1, while disabling
alternative 0 (and vice versa when AVX is disabled through noavx isa
attribute).
- Correct modes are conditionally enabled via VI12_AVX2 iterator
- Base ISA level is enabled via insn predicate (TARGET_SSE2).

You have to track three dependant conditions to calculate how insn
pattern/mode/operand predicate are enabled ;)

Uros,


Re: [PATCH] pass filtering for -fopt-info

2012-10-30 Thread Xinliang David Li
On Tue, Oct 30, 2012 at 1:21 AM, Sharad Singhai  wrote:
> As per discussion in http://gcc.gnu.org/ml/gcc/2012-10/msg00225.html,
> I have added the -fopt-info pass filtering in the attached patch.
>
> The basic idea is that there are optimization pass groups and a user
> can selectively enable dumps for these group(s) via command-line
> syntax. Currently, I define the following optimization groups: 'loop',
> 'lto', 'inline', 'vec', and 'omp'. A pass can be in multiple groups.
>
> If a pass doesn't explicitly specify an optimization group, (denoted
> by OPTGROUP_NONE) then a group is assigned based on the pass type.
> These three are the obvious implicit groups: 'tree', 'ipa', and 'rtl'.


I agree with Richard -- we don't need these implicit groups.

>
> Also there is a catch-all group, called 'optall'.


Why is this needed? Isn't that the default?


>
> The options for -fopt-info dumps verbosity remain 'optimized',
> 'missed', 'note', and 'all'. Since these two types of options,
> verbosity and optimization groups are non-overlapping, I have decided
> to freely mix them. Something like this works as expected, i.e., dump
> missed vectorizer info into vec.missed.
>
> gcc ... -fopt-info-vec-missed=vec.missed
>
> which is equivalent to
>
> gcc ... -fopt-info-missed-vec=vec.missed
>
> However, the order still matters, and it can be somewhat confusing. For 
> example,
>
> gcc -fopt-info-vec-missed=vec.miss -fopt-info-vec-optimized=vec.opt
>
> will dump missed and optimized vectorizer info into vec.opt, while no
> vec.miss is produced. This is due to the fact that the latter group
> specifier, 'vec' overrides the first one. However, the 'missed' and
> 'optimized' are both honored as there is no conflict there. This is
> somewhat confusing. Hopefully, this type of usage would not be common.

Please document that only one alt dump file per optimization-group is supported.

>
> I have updated the documentation to include -fopt-info examples, and
> added some details about -fopt-info command line conflicts.

thanks,

David

>
> Thanks,
> Sharad


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Eric Botcazou
> I'd say either we should do the TREE_BLOCK setting on all non-shareable
> trees during gimple-low and clear the block (but then likely whole
> location?; it doesn't make sense to say in the debugger that something
> has certain source location when you can't print variables declared in that
> location) if copying expressions for use elsewhere, outside of the
> containing function.  Or say during gimplification or gimple-low.c
> simply set t->exp.locus of all non-shareable expressions to UNKNOWN_LOCATION
> to make it clear we don't use it (wonder if that could affect debug info
> quality, perhaps not that much), but during expansion if creating trees
> from TERed stmts they need to be set back, or the current location/block
> adjusted accordingly.

The debugger isn't the only consumer of debug info, and other tools might need 
a finer granularity than a GIMPLE location, so clearing EXPR_LOCATION to work 
around a debug info size issue seems very short-sighted (to say the least).

-- 
Eric Botcazou


[PING^2] [C++ PATCH] Add overflow checking to __cxa_vec_new[23]

2012-10-30 Thread Florian Weimer

On 09/17/2012 12:54 PM, Florian Weimer wrote:

On 09/17/2012 12:15 PM, Paolo Carlini wrote:

Hi,

On 09/17/2012 11:51 AM, Florian Weimer wrote:

On 08/21/2012 12:37 PM, Florian Weimer wrote:

I don't think there are any callers out there, but let's fix this for
completeness.

A compiler emitting code to call this function would still have to
perform overflow checks for the new T[n][m] case, so this interface is
not as helpful as it looks at first glance.

Tested on x86_64-redhat-linux-gnu.


Ping?

This function is apparently used by compilers based on the EDG
front-end, so it's not actually dead.



Being code that touches the library, the patch should go to the
libstdc++ maliling list too. Likewise the testcase, should be in the
libstdc++ testsuite, I guess.


Oh, I thought that this wouldn't apply to internal C++ support code.
Sorry.


That said, I didn't really follow the details of your recent work. Who
did? Jason? I would gently ping the same maintainer.


Indeed, Jason reviewed that.  Cc:ing.


Ping?

Patch is at: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01416.html

Thanks,
Florian
--
Florian Weimer / Red Hat Product Security Team


Re: [PATCH] pass filtering for -fopt-info

2012-10-30 Thread Xinliang David Li
On Tue, Oct 30, 2012 at 8:28 AM, Richard Biener
 wrote:
> On Tue, Oct 30, 2012 at 9:21 AM, Sharad Singhai  wrote:
>> As per discussion in http://gcc.gnu.org/ml/gcc/2012-10/msg00225.html,
>> I have added the -fopt-info pass filtering in the attached patch.
>>
>> The basic idea is that there are optimization pass groups and a user
>> can selectively enable dumps for these group(s) via command-line
>> syntax. Currently, I define the following optimization groups: 'loop',
>> 'lto', 'inline', 'vec', and 'omp'. A pass can be in multiple groups.
>>
>> If a pass doesn't explicitly specify an optimization group, (denoted
>> by OPTGROUP_NONE) then a group is assigned based on the pass type.
>> These three are the obvious implicit groups: 'tree', 'ipa', and 'rtl'.
>>
>> Also there is a catch-all group, called 'optall'.
>>
>> The options for -fopt-info dumps verbosity remain 'optimized',
>> 'missed', 'note', and 'all'. Since these two types of options,
>> verbosity and optimization groups are non-overlapping, I have decided
>> to freely mix them. Something like this works as expected, i.e., dump
>> missed vectorizer info into vec.missed.
>>
>> gcc ... -fopt-info-vec-missed=vec.missed
>>
>> which is equivalent to
>>
>> gcc ... -fopt-info-missed-vec=vec.missed
>>
>> However, the order still matters, and it can be somewhat confusing. For 
>> example,
>>
>> gcc -fopt-info-vec-missed=vec.miss -fopt-info-vec-optimized=vec.opt
>>
>> will dump missed and optimized vectorizer info into vec.opt, while no
>> vec.miss is produced. This is due to the fact that the latter group
>> specifier, 'vec' overrides the first one. However, the 'missed' and
>> 'optimized' are both honored as there is no conflict there. This is
>> somewhat confusing. Hopefully, this type of usage would not be common.
>
> What I'd expect from that would be both vec.miss and vec.opt being
> populated ... (thus go away from the duality of dump files to
> primary dump file plus a set of alternate dump files).
>
>> I have updated the documentation to include -fopt-info examples, and
>> added some details about -fopt-info command line conflicts.
>
> I like it overall, not sure if we want to pre-populate the OPTGROUP
> set too much at this point.  Like what is 'tree' or 'rtl' to users?
> nothing I think.
> 'ipa' yes.  'lto'?  sounds redundant with 'ipa' to me.  'omp'?  we don't have
> any optimizations here.


OMP is a high level transformation, and it seems to be a good
candidate group, but this part does not need to be designed now. For
instance, there are a bunch of FDO related transformation (indirect
call promotion, memcpy transformation etc), and coverage mismatch
notes etc a good candidate to be filtered.

thanks,

David


>
> Thus please drop TREE, RTL, LTO and OMP for now.
>
> Otherwise I'm leaving it for comments from other folks.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Sharad


Re: RFA: hookize ADJUST_INSN_LENGTH (Was: RFA: Add lock_lenth attribute to support the ARC port)

2012-10-30 Thread Joern Rennecke

Quoting Richard Biener :


Apart from the iteration_threshold the hookization would be straight-forward.
Now I cannot decipher from the patch what functional change it introduces ;)


The only change occurs if we reach an iteration count of MAX_INT iterations -
which should already be indicative of a problem.

At the MAX_INTth iteration, we will apply the length locking logic to
instructions inside a delay slot sequence as well.

If we disregard this exceptional case, there should be no functional changes
unless someone defines TARGET_ADJUST_INSN_LENGTH.

uid_lock_length gets initialized to allocated with XCNEWVEC.  So, in
particular, the elements corresponding to instructions inside delay slot
sequences are initialized to zero.

As default hook sets *iter_threshold to MAX_INT when inside a delay sequence,
and doesn't change length, the max operation with uid_lock_length is a no-op,
and *niter < iter_threshold is true, hence a length change results in
updating the length immediately, without changing uid_lock_length.

In the case that we are not inside a delay slot sequence, the default hook
leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH.  Thus, when niter
is 0 or larger, as is the case for the ordinary looping operation, we
always find *niter >= iter_threshold, and thus apply the length  
locking mechanism.


If we are in the preliminary pass, or doing the single !increasing iteration,
niter is set to -1, so *inter < iter_threshold is always true, so again we
update the length immediately.


There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to
warrant introducing a hook without removing the macro IMHO.


Ok, I can prepare a patch for that, even though it makes it even a bit
less obvious that there's no functional change.

What about the preparatory patch
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html ?
Can I check that in now?


Re: [PING^2] [C++ PATCH] Add overflow checking to __cxa_vec_new[23]

2012-10-30 Thread Paolo Carlini
Hi,

Florian Weimer  ha scritto:

>Ping?
>
>Patch is at: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01416.html

Sorry, I don't know the code well enough to review your patch, but since I'm in 
CC, I still don't understand why, instead of adding a full libstdc++ testcase 
you are extending a C++ testcase, in old-deja even, normally considered legacy.

Paolo




[PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures

2012-10-30 Thread Steven Bosscher
Hello,

Hot/cold partitioning is apparently a hot topic all of a sudden, which
is a good thing of course, because it's in need of some TLC.

The attached patch adds another check the RTL cfg checking
(verify_flow_info) for the partitioning: A hot block can never be
dominated by a cold block (because the dominated block must also be
cold). This trips in PR55121.

I haven't tested this with any profiling tests, but it's bound to
break things. From my POV, whatever gets broken by this patch was
already broken to begin with :-)   If you're in CC, it's because I
hope you can help test this patch.

Downside of this patch is that I need dominance info. If it's not
available, I compute and free it. I'm not sure if this works if
dominance info status is DOM_NO_FAST_QUERY, and I don't want to
recompute in this case because IMHO a verifier should be a no-op from
the POV of the rest of the compiler, and updating dominators would
make this patch a not-a-no-op :-)

Thoughts/comments?

Ciao!
Steven

* cfgrtl.c (rtl_verify_flow_info_1): Verify that blocks in the
hot partition are not dominated by blocks in the cold partition.

Index: cfgrtl.c
===
--- cfgrtl.c(revision 191819)
+++ cfgrtl.c(working copy)
@@ -2033,6 +2033,7 @@ rtl_verify_flow_info_1 (void)
   rtx x;
   int err = 0;
   basic_block bb;
+  bool have_partitions = false;

   /* Check the general integrity of the basic blocks.  */
   FOR_EACH_BB_REVERSE (bb)
@@ -2145,6 +2146,8 @@ rtl_verify_flow_info_1 (void)
n_eh++;
  else if (e->flags & EDGE_ABNORMAL)
n_abnormal++;
+
+ have_partitions |= is_crossing;
}

   if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
@@ -2263,6 +2268,40 @@ rtl_verify_flow_info_1 (void)
  }
 }

+  /* If there are partitions, do a sanity check on them: A basic block in
+ a cold partition cannot dominate a basic block in a hot partition.  */
+  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
+  if (have_partitions && !err)
+FOR_EACH_BB (bb)
+  if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+   VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
+  if (! VEC_empty (basic_block, bbs_in_cold_partition))
+{
+  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+  basic_block son;
+
+  if (dom_calculated_here)
+   calculate_dominance_info (CDI_DOMINATORS);
+
+  while (! VEC_empty (basic_block, bbs_in_cold_partition))
+   {
+ bb = VEC_pop (basic_block, bbs_in_cold_partition);
+ if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+   {
+ error ("non-cold basic block %d dominated "
+"by a block in the cold partition", bb->index);
+ err = 1;
+   }
+ for (son = first_dom_son (CDI_DOMINATORS, bb);
+  son;
+  son = next_dom_son (CDI_DOMINATORS, son))
+   VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
+   }
+
+  if (dom_calculated_here)
+   free_dominance_info (CDI_DOMINATORS);
+}
+
   /* Clean up.  */
   return err;
 }


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
On Tue, Oct 30, 2012 at 8:29 AM, Richard Biener
 wrote:
> On Tue, Oct 30, 2012 at 4:21 PM, Jakub Jelinek  wrote:
>> On Tue, Oct 30, 2012 at 04:15:38PM +0100, Richard Biener wrote:
>>> So maybe TER (well, those looking up the stmt) should pick the location
>>> from the TERed statement properly then?
>>
>> Perhaps, but Micha's patch doesn't do that.
>> But in that case IMHO it still would help to set all expr locations to
>> UNKNOWN_LOCATION during gimple lowering, to make it clear we ignore
>> the locations.
>
> Yes indeed.

I agree, but this looks like too bold a move at this point. Shall we
do that in 4.8?

BTW, I updated the patch to ensure pr43479.c works fine. The testing
is still on-going.

Dehao

gcc/ChangeLog:
2012-10-25  Dehao Chen  

* tree-eh.c (do_return_redirection): Set location for jump statement.
(do_goto_redirection): Likewise.
(frob_into_branch_around): Likewise.
(lower_try_finally_nofallthru): Likewise.
(lower_try_finally_copy): Likewise.
(lower_try_finally_switch): Likewise.
* expr.c (store_expr): Use current insn location instead of expr
location.
(expand_expr_real): Likewise.
(expand_expr_real_1): Likewise.

gcc/testsuite/ChangeLog:
2012-10-25  Dehao Chen  

* g++.dg/debug/dwarf2/block.C: New testcase.
Index: testsuite/g++.dg/debug/dwarf2/block.C
===
--- testsuite/g++.dg/debug/dwarf2/block.C   (revision 0)
+++ testsuite/g++.dg/debug/dwarf2/block.C   (revision 0)
@@ -0,0 +1,29 @@
+// Compiler should not generate too many lexical blocks for this function.
+// { dg-do compile { target { i?86-*-* x86_64-*-* } } }
+// { dg-options "-O0 -fno-exceptions -g -dA" }
+
+union UElement {
+void* pointer;
+int integer;
+};
+struct UColToken {
+  unsigned source;
+  unsigned char **rulesToParseHdl;
+};
+
+int uhash_hashTokens(const union UElement k)
+{
+  int hash = 0;
+  struct UColToken *key = (struct UColToken *)k.pointer;
+  if (key != 0) {
+int len = (key->source & 0xFF00)>>24;
+int inc = ((len - 32) / 32) + 1;
+const unsigned char *p = (key->source & 0x00FF)
++ *(key->rulesToParseHdl);
+const unsigned char *limit = p + len;
+hash = *p + *limit;
+  }
+  return hash;
+}
+
+// { dg-final { scan-assembler-not "LBB10" } }
Index: tree-eh.c
===
--- tree-eh.c   (revision 192809)
+++ tree-eh.c   (working copy)
@@ -739,6 +739,7 @@ do_return_redirection (struct goto_queue_node *q,
 gimple_seq_add_seq (&q->repl_stmt, mod);
 
   x = gimple_build_goto (finlab);
+  gimple_set_location (x, q->location);
   gimple_seq_add_stmt (&q->repl_stmt, x);
 }
 
@@ -758,6 +759,7 @@ do_goto_redirection (struct goto_queue_node *q, tr
 gimple_seq_add_seq (&q->repl_stmt, mod);
 
   x = gimple_build_goto (finlab);
+  gimple_set_location (x, q->location);
   gimple_seq_add_stmt (&q->repl_stmt, x);
 }
 
@@ -857,6 +859,7 @@ frob_into_branch_around (gimple tp, eh_region regi
   if (!over)
over = create_artificial_label (loc);
   x = gimple_build_goto (over);
+  gimple_set_location (x, loc);
   gimple_seq_add_stmt (&cleanup, x);
 }
   gimple_seq_add_seq (&eh_seq, cleanup);
@@ -1085,6 +1088,7 @@ lower_try_finally_nofallthru (struct leh_state *st
  emit_post_landing_pad (&eh_seq, tf->region);
 
  x = gimple_build_goto (lab);
+ gimple_set_location (x, gimple_location (tf->try_finally_expr));
  gimple_seq_add_stmt (&eh_seq, x);
}
 }
@@ -1223,6 +1227,7 @@ lower_try_finally_copy (struct leh_state *state, s
 
   tmp = lower_try_finally_fallthru_label (tf);
   x = gimple_build_goto (tmp);
+  gimple_set_location (x, tf_loc);
   gimple_seq_add_stmt (&new_stmt, x);
 }
 
@@ -1395,6 +1400,7 @@ lower_try_finally_switch (struct leh_state *state,
 
   tmp = lower_try_finally_fallthru_label (tf);
   x = gimple_build_goto (tmp);
+  gimple_set_location (x, tf_loc);
   gimple_seq_add_stmt (&switch_body, x);
 }
 
@@ -1423,6 +1429,7 @@ lower_try_finally_switch (struct leh_state *state,
   gimple_seq_add_stmt (&eh_seq, x);
 
   x = gimple_build_goto (finally_label);
+  gimple_set_location (x, tf_loc);
   gimple_seq_add_stmt (&eh_seq, x);
 
   tmp = build_int_cst (integer_type_node, eh_index);
Index: expr.c
===
--- expr.c  (revision 192809)
+++ expr.c  (working copy)
@@ -5030,7 +5030,7 @@ store_expr (tree exp, rtx target, int call_param_p
 {
   rtx temp;
   rtx alt_rtl = NULL_RTX;
-  location_t loc = EXPR_LOCATION (exp);
+  location_t loc = curr_insn_location ();
 
   if (VOID_TYPE_P (TREE_TYPE (exp)))
 {
@@ -7869,31 +7869,7 @@ expand_expr_real (tree exp, rtx target, enum machi
   return ret ? ret : const0_rtx;
 }
 
-  /* If this is an expression of som

Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 8:49 AM, Matthew Gretton-Dann wrote:
> On 30 October 2012 05:20, Teresa Johnson wrote:
>> Index: cfgrtl.c
>> ===
>> --- cfgrtl.c(revision 192692)
>> +++ cfgrtl.c(working copy)
>> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
>>   partition boundaries).  See  the comments at the top of
>>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (BB_PARTITION (a) != BB_PARTITION (b))
>> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
>> +  || BB_PARTITION (a) != BB_PARTITION (b))
>>  return false;
>>
>>/* Protect the loop latches.  */
>> @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi
>>   partition boundaries).  See  the comments at the top of
>>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (BB_PARTITION (a) != BB_PARTITION (b))
>> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
>> +  || BB_PARTITION (a) != BB_PARTITION (b))
>>  return false;
>>
>>/* Protect the loop latches.  */
>
> As this if() condition seems to be the canonical way to detect being
> in a different partition should it be moved out into a query function,
> and all of cfgrtl.c updated to use it?

Not just in cfgrtl.c but for example also in ifcvt.c (which currently
only tests for notes, that's broken).

Ciao!
Steven


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
BTW, one thing I found confusing is that in expr.c, some code is for
frontend, while some are for rtl. Shall we separate them into two
files? And we don't expect to see EXPR_LOCATION in the rtl side.

Thanks,
Dehao


Re: [PING^2] [C++ PATCH] Add overflow checking to __cxa_vec_new[23]

2012-10-30 Thread Florian Weimer

On 10/30/2012 05:17 PM, Paolo Carlini wrote:


Sorry, I don't know the code well enough to review your patch, but since I'm in 
CC, I still don't understand why, instead of adding a full libstdc++ testcase 
you are extending a C++ testcase, in old-deja even, normally considered legacy.


AFAIK, this is the only place we have such a test.  I suppose I could it 
put it into testsuite/18_support, but I would have to duplicate a bit of 
the machinery of the original test case because I can't just write a 
class and take the address of its constructor and destructor (whose 
addresses are passed to the tested functions).  I really didn't want to 
do that because there are some platform dependencies (the __ARM_EABI__ 
#ifdef).


Not sure if this makes sense, but those were my reasons.

--
Florian Weimer / Red Hat Product Security Team


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
> The debugger isn't the only consumer of debug info, and other tools might need
> a finer granularity than a GIMPLE location, so clearing EXPR_LOCATION to work
> around a debug info size issue seems very short-sighted (to say the least).

Hi, Eric,

There might be some misunderstanding here. Clearing EXPR_LOCATION is
not a work around to reduce debug size. It is aiming at making gcc
implementation cleaner. And before we resetting it, we need to truely
make sure nothing after gimple-low is dependent on it. Please let me
know if you have other concerns.

Thanks,
Dehao

>
> --
> Eric Botcazou


[PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Peter Bergner
I'm hitting the same bug as in PR53708 when compiling GLIBC's dlfcn.c when
vectorization is enabled on powerpc64-linux.  A reduced test case is:

bergner@bns:~/gcc/BUGS> cat foo.i 
static void (*const init_array []) (void)
  __attribute__ ((section (".init_array"), aligned (sizeof (void *)), used))
= { 0 };

bergner@bns:~/gcc/BUGS> /home/bergner/gcc/build/gcc-fsf-4_7-base/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-4_7-base/gcc -S -m64 -O3 -maltivec foo.i -o
bad.s

bergner@bns:~/gcc/BUGS> /home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc -S -m64 -O3 -maltivec foo.i
-o good.s

bergner@bns:~/gcc/BUGS> diff -u bad.s good.s 
--- bad.s2012-10-30 10:41:15.0 -0500
+++ good.s2012-10-30 10:41:23.0 -0500
@@ -2,7 +2,7 @@
 .section".toc","aw"
 .section".text"
 .section.init_array,"a"
-.align 4
+.align 3
 .typeinit_array, @object
 .sizeinit_array, 8
 init_array:

The above is bad, because the extra alignment causes the linker to add some
null padding to the init_array and the loader isn't expecting that and ends
up segv'ing.  I'd like to backport Richard's patch below to the 4.7 branch.
The patch bootstrapped and regtested on powerpc64-linux with no regressions.

Is it ok for the 4.7 branch?

Peter


Backport from mainline
2012-06-19  Richard Guenther  

PR tree-optimization/53708
* tree-vect-data-refs.c (vect_can_force_dr_alignment_p): Preserve
user-supplied alignment and alignment of decls with the used
attribute.
 
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 192988)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -4574,6 +4574,12 @@
   if (TREE_ASM_WRITTEN (decl))
 return false;
 
+  /* Do not override explicit alignment set by the user or the alignment
+ as specified by the ABI when the used attribute is set.  */
+  if (DECL_USER_ALIGN (decl)
+  || DECL_PRESERVE_P (decl))
+return false;
+
   if (TREE_STATIC (decl))
 return (alignment <= MAX_OFILE_ALIGNMENT);
   else




Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures

2012-10-30 Thread Teresa Johnson
On Tue, Oct 30, 2012 at 9:26 AM, Steven Bosscher  wrote:
> Hello,
>
> Hot/cold partitioning is apparently a hot topic all of a sudden, which
> is a good thing of course, because it's in need of some TLC.
>
> The attached patch adds another check the RTL cfg checking
> (verify_flow_info) for the partitioning: A hot block can never be
> dominated by a cold block (because the dominated block must also be
> cold). This trips in PR55121.
>
> I haven't tested this with any profiling tests, but it's bound to
> break things. From my POV, whatever gets broken by this patch was
> already broken to begin with :-)   If you're in CC, it's because I
> hope you can help test this patch.

I will try testing your patch on top of mine with our fdo benchmarks.
For the others on the cc list, you may need to include my patch as
well for testing. Without it, -freorder-blocks-and-partition was DOA
for me. For my patch, see
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html

Teresa

>
> Downside of this patch is that I need dominance info. If it's not
> available, I compute and free it. I'm not sure if this works if
> dominance info status is DOM_NO_FAST_QUERY, and I don't want to
> recompute in this case because IMHO a verifier should be a no-op from
> the POV of the rest of the compiler, and updating dominators would
> make this patch a not-a-no-op :-)
>
> Thoughts/comments?
>
> Ciao!
> Steven
>
> * cfgrtl.c (rtl_verify_flow_info_1): Verify that blocks in the
> hot partition are not dominated by blocks in the cold partition.
>
> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 191819)
> +++ cfgrtl.c(working copy)
> @@ -2033,6 +2033,7 @@ rtl_verify_flow_info_1 (void)
>rtx x;
>int err = 0;
>basic_block bb;
> +  bool have_partitions = false;
>
>/* Check the general integrity of the basic blocks.  */
>FOR_EACH_BB_REVERSE (bb)
> @@ -2145,6 +2146,8 @@ rtl_verify_flow_info_1 (void)
> n_eh++;
>   else if (e->flags & EDGE_ABNORMAL)
> n_abnormal++;
> +
> + have_partitions |= is_crossing;
> }
>
>if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
> @@ -2263,6 +2268,40 @@ rtl_verify_flow_info_1 (void)
>   }
>  }
>
> +  /* If there are partitions, do a sanity check on them: A basic block in
> + a cold partition cannot dominate a basic block in a hot partition.  */
> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
> +  if (have_partitions && !err)
> +FOR_EACH_BB (bb)
> +  if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> +   VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
> +{
> +  bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> +  basic_block son;
> +
> +  if (dom_calculated_here)
> +   calculate_dominance_info (CDI_DOMINATORS);
> +
> +  while (! VEC_empty (basic_block, bbs_in_cold_partition))
> +   {
> + bb = VEC_pop (basic_block, bbs_in_cold_partition);
> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
> +   {
> + error ("non-cold basic block %d dominated "
> +"by a block in the cold partition", bb->index);
> + err = 1;
> +   }
> + for (son = first_dom_son (CDI_DOMINATORS, bb);
> +  son;
> +  son = next_dom_son (CDI_DOMINATORS, son))
> +   VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
> +   }
> +
> +  if (dom_calculated_here)
> +   free_dominance_info (CDI_DOMINATORS);
> +}
> +
>/* Clean up.  */
>return err;
>  }



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Fix debug info for expr and jump stmt

2012-10-30 Thread Dehao Chen
> gcc/ChangeLog:
> 2012-10-25  Dehao Chen  
>
> * tree-eh.c (do_return_redirection): Set location for jump statement.
> (do_goto_redirection): Likewise.
> (frob_into_branch_around): Likewise.
> (lower_try_finally_nofallthru): Likewise.
> (lower_try_finally_copy): Likewise.
> (lower_try_finally_switch): Likewise.
> * expr.c (store_expr): Use current insn location instead of expr
> location.
> (expand_expr_real): Likewise.
> (expand_expr_real_1): Likewise.
>
> gcc/testsuite/ChangeLog:
> 2012-10-25  Dehao Chen  
>
> * g++.dg/debug/dwarf2/block.C: New testcase.

This patch bootstrapped and passed gcc regression tests.

Okay for trunk?

Thanks,
Dehao


Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote:
> Index: bb-reorder.c
> ===
> --- bb-reorder.c(revision 192692)
> +++ bb-reorder.c(working copy)
> @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void)
> first_partition = BB_PARTITION (bb);
>if (BB_PARTITION (bb) != first_partition)
> {
> +  /* There should be a barrier between text sections. */
> +  emit_barrier_after (BB_END (bb->prev_bb));

So why isn't there one? There can't be a fall-through edge from one
section to the other, so cfgrtl.c:fixup_reorder_chain should have
added a barrier here already in the code under the comment:

  /* Now add jumps and labels as needed to match the blocks new
 outgoing edges.  */

Why isn't it doing that for you?

BTW, something else I noted in cfgrtl.c:
NOTE_INSN_SWITCH_TEXT_SECTIONS shouldn't be copied in
duplicate_insn_chain, so the following is necessary for robustness:

Index: cfgrtl.c
===
--- cfgrtl.c(revision 191819)
+++ cfgrtl.c(working copy)
@@ -3615,7 +3615,6 @@
  break;

case NOTE_INSN_EPILOGUE_BEG:
-   case NOTE_INSN_SWITCH_TEXT_SECTIONS:
  emit_note_copy (insn);
  break;


There can be only one! One note to rule them all! etc.


> Index: cfgrtl.c
> ===
> --- cfgrtl.c(revision 192692)
> +++ cfgrtl.c(working copy)
> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b
>   partition boundaries).  See  the comments at the top of
>   bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (BB_PARTITION (a) != BB_PARTITION (b))
> +  if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX)
> +  || BB_PARTITION (a) != BB_PARTITION (b))
>  return false;

My dislike for this whole scheme just continues to grow... How can
there be a REG_CROSSING_JUMP note if BB_PARTITION(a)==BB_PARTITION(b)?
That is a bug. We should not need the notes here.

As long as we have the CFG, BB_PARTITION(a)==BB_PARTITION(b) should be
the canonical way to check whether two blocks are in the same
partition, and the EDGE_CROSSING flag should be set iff an edge
crosses from one section to another. The REG_CROSSING_JUMP note should
only be used to see if a JUMP_INSN may jump to another section,
without having to check all successor edges.

Any place where we have to check the BB_PARTITION or
edge->flags&EDGE_CROSSING *and* REG_CROSSING_JUMP indicates a bug in
the partitioning updating.

Another BTW: sched-vis.c doesn't handle REG_CROSSING_JUMP notes so
that slim RTL dumping breaks. I need this patchlet to make things
work:
Index: sched-vis.c
===
--- sched-vis.c (revision 191819)
+++ sched-vis.c (working copy)
@@ -553,6 +553,11 @@
 {
   char t1[BUF_LEN], t2[BUF_LEN], t3[BUF_LEN];

+  if (! x)
+{
+  sprintf (buf, "(nil)");
+  return;
+}
   switch (GET_CODE (x))
 {
 case SET:

Ciao!
Steven


Re: [PATCH] Replace const_vector with match_operand in sse.md

2012-10-30 Thread Andrey Turetskiy
Thanks for explanation, I understand it.
I fixed issue which you marked. Changelog is unchanged.

---
Best regards,
Andrey Turetskiy

On Tue, Oct 30, 2012 at 7:40 PM, Uros Bizjak  wrote:
> On Tue, Oct 30, 2012 at 3:47 PM, Andrey Turetskiy
>  wrote:
>> I changed the patch according Uros' remarks. Please, have a look.
>>
>> Changelog:
>>
>> 2012-10-30  Andrey Turetskiy  
>>
>>* config/i386/i386.c (bdesc_args): Rename 
>> CODE_FOR_avx2_umulhrswv16hi3 to
>>CODE_FOR_avx2_pmulhrswv16hi3.
>>* config/i386/predicates.md (const1_operand): Extend for vectors.
>>* config/i386/sse.md (ssse3_avx2): Extend.
>>(ssedoublemode): Ditto.
>>(_uavg3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3,
>>avx2_uavgv16hi3 and sse2_uavgv8hi3 into one.
>>(*_uavg3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3,
>>*avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one.
>>(PMULHRSW): New.
>>(_pmulhrsw3): Merge avx2_umulhrswv16hi3,
>>ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one.
>>(*avx2_pmulhrswv16hi3): Replace const_vector with match_operand.
>>(*ssse3_pmulhrswv8hi3): Ditto.
>>(*ssse3_pmulhrswv4hi3): Ditto.
>
> +{
> +  ix86_fixup_binary_operands_no_copy (PLUS, mode, operands);
> +  operands[3] = CONST1_RTX(mode);
> +})
>
> Please put operands[3] initialization before the call to
> ix86_f_b_o_n_c. We don't want to pass uninitialized operands around.
>
> +{
> +  if (which_alternative == 0
> +  && (mode == V16QImode
> +  || mode == V8HImode))
> +return "pavg\t{%2, %0|%0, %2}";
> +  return "vpavg\t{%2, %1, %0|%0, %1, %2}";
> +}
> +  [(set (attr "isa")
> + (if_then_else
> +   (match_test "which_alternative == 0 && (mode ==
> V16QImode || mode == V8HImode)")
> + (const_string "noavx")
> + (const_string "avx")))
> +   (set (attr "prefix_data16")
> +(if_then_else (eq_attr "isa" "noavx")
> +  (const_string "1")
> +  (const_string "*")))
> +   (set (attr "prefix")
> +(if_then_else (eq_attr "isa" "noavx")
> +  (const_string "orig")
> +  (const_string "vex")))
>
> Uh, oh.
>
> Please note that "isa" attribute enables or disables the alternative
> through "enabled" attribute. Just change the "mode" attribute to
> "" and everything will magically work:
> - AVX2 implies AVX, so it enables alternative 1, while disabling
> alternative 0 (and vice versa when AVX is disabled through noavx isa
> attribute).
> - Correct modes are conditionally enabled via VI12_AVX2 iterator
> - Base ISA level is enabled via insn predicate (TARGET_SSE2).
>
> You have to track three dependant conditions to calculate how insn
> pattern/mode/operand predicate are enabled ;)
>
> Uros,


const_vector_replace.patch
Description: Binary data


Re: [patch,libgcc] m32r*rtems* add crtinit.o and crtfinit.o

2012-10-30 Thread Ian Lance Taylor
On Tue, Oct 30, 2012 at 12:10 AM, Ralf Corsepius
 wrote:
>
> I would like to apply the patch below to trunk and gcc-4.7-branch.
>
> This patch was originalyl submitted by Joel Sherrill back in May
> (http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01180.html),
> but had never received any feedback.
>
> It has been part of the rtems-gcc patches, since then.

You are an RTEMS maintainer; you don't need any additional approval to
commit this patch to trunk.

Ian


Re: [PATCH] Replace const_vector with match_operand in sse.md

2012-10-30 Thread Uros Bizjak
On Tue, Oct 30, 2012 at 6:53 PM, Andrey Turetskiy
 wrote:
> Thanks for explanation, I understand it.
> I fixed issue which you marked. Changelog is unchanged.
>
>>> I changed the patch according Uros' remarks. Please, have a look.
>>>
>>> Changelog:
>>>
>>> 2012-10-30  Andrey Turetskiy  
>>>
>>>* config/i386/i386.c (bdesc_args): Rename 
>>> CODE_FOR_avx2_umulhrswv16hi3 to
>>>CODE_FOR_avx2_pmulhrswv16hi3.
>>>* config/i386/predicates.md (const1_operand): Extend for vectors.
>>>* config/i386/sse.md (ssse3_avx2): Extend.
>>>(ssedoublemode): Ditto.
>>>(_uavg3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3,
>>>avx2_uavgv16hi3 and sse2_uavgv8hi3 into one.
>>>(*_uavg3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3,
>>>*avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one.
>>>(PMULHRSW): New.
>>>(_pmulhrsw3): Merge avx2_umulhrswv16hi3,
>>>ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one.
>>>(*avx2_pmulhrswv16hi3): Replace const_vector with match_operand.

Replace const_vector with const1_operand predicate.

>>>(*ssse3_pmulhrswv8hi3): Ditto.
>>>(*ssse3_pmulhrswv4hi3): Ditto.

Yes, the patch is OK for mainline SVN.

BTW: Probably, pmulhrsw insn patterns can be merged, too, but this can
be a follow-up patch.

Thanks,
Uros.


Re: [patch] Unify bitmap interface.

2012-10-30 Thread Lawrence Crowl
On 10/30/12, Diego Novillo  wrote:
> On Oct 30, 2012 Bin.Cheng  wrote:
> > Just one question: Should we change the name of functions
> > "sbitmap_intersection_of_succs/sbitmap_intersection_of_preds/
> > sbitmap_union_of_succs/sbitmap_union_of_preds" too? It might
> > be a little confusing that sbitmap_* is used among bitmap_*.
>
> Yes.  Lawrence is proceeding with this unification in stages.
> The next few patches should rename these.

Actually, I didn't know about them because they are auxillary
routines declared outside of the bitmap files.  I've gone ahead
and changed them as well.

> The only two sets of functions that will remain separate for
> now are the iterators and the bitmap creation routines, I think.
> Lawrence?

The iterator functions have been unified, but not the iterator
type names.

-- 
Lawrence Crowl


Non-dominating loop bounds in tree-ssa-loop-niter 2/4

2012-10-30 Thread Jan Hubicka
Hi,
this patch implements the second part of planned change - to determine loop 
bounds
based by shortest path discovery.  This allows to bound number of iterations
on loops with bounds in statements that do not dominate the latch.

I originally planned to implement this as part of maybe_lower_iteration_bound,
but I found doing it separately is more readable.  While both performs walk of
the loop body, both do that for different reasons.

discover_iteration_bound_by_body_walk walks from header to latch, while
maybe_lower_iteration_bound walks from header to first statement with side
effects looking if there is exit on the way.

Both walks can be skipped for many loops, but each with different reasons.

Bootstrapped/regtested x86_64-linux, OK?

* tree-ssa-loop-niter.c (double_int_cmp, bound_index,
discover_iteration_bound_by_body_walk): New functions.
(discover_iteration_bound_by_body_walk): Use it.

* gcc.dg/tree-ssa/loop-38.c: New testcase.

Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 192989)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -2955,6 +2955,234 @@ gcov_type_to_double_int (gcov_type val)
   return ret;
 }
 
+/* Compare double ints, callback for qsort.  */
+
+int
+double_int_cmp (const void *p1, const void *p2)
+{
+  const double_int *d1 = (const double_int *)p1;
+  const double_int *d2 = (const double_int *)p2;
+  if (*d1 == *d2)
+return 0;
+  if (d1->ult (*d2))
+return -1;
+  return 1;
+}
+
+/* Return index of BOUND in BOUNDS array sorted in increasing order.
+   Lookup by binary search.  */
+
+int
+bound_index (VEC (double_int, heap) *bounds, double_int bound)
+{
+  unsigned int end = VEC_length (double_int, bounds);
+  unsigned int begin = 0;
+
+  /* Find a matching index by means of a binary search.  */
+  while (begin != end)
+{
+  unsigned int middle = (begin + end) / 2;
+  double_int index = VEC_index (double_int, bounds, middle);
+
+  if (index == bound)
+   return middle;
+  else if (index.ult (bound))
+   begin = middle + 1;
+  else
+   end = middle;
+}
+  gcc_unreachable ();
+}
+
+/* Used to hold vector of queues of basic blocks bellow.  */
+typedef VEC (basic_block, heap) *bb_queue;
+DEF_VEC_P(bb_queue);
+DEF_VEC_ALLOC_P(bb_queue,heap);
+
+/* We recorded loop bounds only for statements dominating loop latch (and thus
+   executed each loop iteration).  If there are any bounds on statements not
+   dominating the loop latch we can improve the estimate by walking the loop
+   body and seeing if every path from loop header to loop latch contains
+   some bounded statement.  */
+
+static void
+discover_iteration_bound_by_body_walk (struct loop *loop)
+{
+  pointer_map_t *bb_bounds;
+  struct nb_iter_bound *elt;
+  VEC (double_int, heap) *bounds = NULL;
+  VEC (bb_queue, heap) *queues = NULL;
+  bb_queue queue = NULL;
+  ptrdiff_t queue_index;
+  ptrdiff_t latch_index = 0;
+  pointer_map_t *visited;
+
+  /* Discover what bounds may interest us.  */
+  for (elt = loop->bounds; elt; elt = elt->next)
+{
+  double_int bound = elt->bound;
+
+  /* Exit terminates loop at given iteration, while non-exits produce 
undefined
+effect on the next iteration.  */
+  if (!elt->is_exit)
+   {
+ bound += double_int_one;
+ /* Overflow, give up on this bound.  */
+ if (bound == double_int_zero)
+   continue;
+   }
+
+  if (!loop->any_upper_bound
+ || bound.ult (loop->nb_iterations_upper_bound))
+VEC_safe_push (double_int, heap, bounds, bound);
+}
+
+  /* Exit early if there is nothing to do.  */
+  if (!bounds)
+return;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, " Trying to walk loop body to reduce the bound.\n");
+
+  /* Sort the bounds in decreasing order.  */
+  qsort (VEC_address (double_int, bounds), VEC_length (double_int, bounds),
+sizeof (double_int), double_int_cmp);
+
+  /* For every basic block record the lowest bound that is guaranteed to
+ terminate the loop.  */
+
+  bb_bounds = pointer_map_create ();
+  for (elt = loop->bounds; elt; elt = elt->next)
+{
+  double_int bound = elt->bound;
+  if (!elt->is_exit)
+   {
+ bound += double_int_one;
+ /* Overflow, give up on this bound.  */
+ if (bound == double_int_zero)
+   continue;
+   }
+
+  if (!loop->any_upper_bound
+ || bound.ult (loop->nb_iterations_upper_bound))
+   {
+ ptrdiff_t index = bound_index (bounds, bound);
+ void **entry = pointer_map_contains (bb_bounds,
+  gimple_bb (elt->stmt));
+ if (!entry)
+   *pointer_map_insert (bb_bounds,
+gimple_bb (elt->stmt)) = (void *)index;
+ else if ((ptrdiff_t)*entry > index)
+   *entry = (void *)index

Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Peter Bergner
On Tue, 2012-10-30 at 11:58 -0500, Peter Bergner wrote:
> I'm hitting the same bug as in PR53708 when compiling GLIBC's dlfcn.c when
> vectorization is enabled on powerpc64-linux.  A reduced test case is:
> 
> bergner@bns:~/gcc/BUGS> cat foo.i 
> static void (*const init_array []) (void)
>   __attribute__ ((section (".init_array"), aligned (sizeof (void *)), used))
> = { 0 };
> 
> bergner@bns:~/gcc/BUGS> /home/bergner/gcc/build/gcc-fsf-4_7-base/gcc/xgcc
> -B/home/bergner/gcc/build/gcc-fsf-4_7-base/gcc -S -m64 -O3 -maltivec foo.i -o
> bad.s
> 
> bergner@bns:~/gcc/BUGS> /home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc/xgcc
> -B/home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc -S -m64 -O3 -maltivec foo.i
> -o good.s
> 
> bergner@bns:~/gcc/BUGS> diff -u bad.s good.s 
> --- bad.s2012-10-30 10:41:15.0 -0500
> +++ good.s2012-10-30 10:41:23.0 -0500
> @@ -2,7 +2,7 @@
>  .section".toc","aw"
>  .section".text"
>  .section.init_array,"a"
> -.align 4
> +.align 3
>  .typeinit_array, @object
>  .sizeinit_array, 8
>  init_array:
> 
> The above is bad, because the extra alignment causes the linker to add some
> null padding to the init_array and the loader isn't expecting that and ends
> up segv'ing.  I'd like to backport Richard's patch below to the 4.7 branch.
> The patch bootstrapped and regtested on powerpc64-linux with no regressions.

Commenting on Richard's question from the bugzilla:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53708#c10

I suppose if attribute((__aligned__)) truly does just set a minimum alignment
value (and the documentation seems to say that) and the compiler is free to
arbitrarily increase it, then the GLIBC code to scan the init_array needs to
be tolerant of null values in init_array.

Does everyone agree with that assessment?

Peter





Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 6:48 PM, Steven Bosscher  wrote:
> On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote:
>> Index: bb-reorder.c
>> ===
>> --- bb-reorder.c(revision 192692)
>> +++ bb-reorder.c(working copy)
>> @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void)
>> first_partition = BB_PARTITION (bb);
>>if (BB_PARTITION (bb) != first_partition)
>> {
>> +  /* There should be a barrier between text sections. */
>> +  emit_barrier_after (BB_END (bb->prev_bb));
>
> So why isn't there one? There can't be a fall-through edge from one
> section to the other, so cfgrtl.c:fixup_reorder_chain should have
> added a barrier here already in the code under the comment:
>
>   /* Now add jumps and labels as needed to match the blocks new
>  outgoing edges.  */
>
> Why isn't it doing that for you?

Maybe it's because fix_up_fall_thru_edges calls force_nonfallthru,
which is incorrectly inserting JUMP_INSNs and BARRIERs in cfglayout
mode. I'm going to test this patch:

Index: cfgrtl.c
===
--- cfgrtl.c(revision 192889)
+++ cfgrtl.c(working copy)
@@ -1511,16 +1511,17 @@ force_nonfallthru_and_redirect (edge e,
 #endif
}
   set_return_jump_label (BB_END (jump_block));
+  emit_barrier_after (BB_END (jump_block));
 }
-  else
+  else if (current_ir_type () == IR_RTL_CFGRTL)
 {
   rtx label = block_label (target);
   emit_jump_insn_after_setloc (gen_jump (label), BB_END (jump_block), loc);
   JUMP_LABEL (BB_END (jump_block)) = label;
   LABEL_NUSES (label)++;
+  emit_barrier_after (BB_END (jump_block));
 }

-  emit_barrier_after (BB_END (jump_block));
   redirect_edge_succ_nodup (e, target);

   if (abnormal_edge_flags)


Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 01:43:33PM -0500, Peter Bergner wrote:
> Commenting on Richard's question from the bugzilla:
> 
>   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53708#c10
> 
> I suppose if attribute((__aligned__)) truly does just set a minimum alignment
> value (and the documentation seems to say that) and the compiler is free to
> arbitrarily increase it, then the GLIBC code to scan the init_array needs to
> be tolerant of null values in init_array.
> 
> Does everyone agree with that assessment?

I think the right test is for user supplied section name,
this approach is so common that I think we have to support it.
E.g. Linux kernel, glibc, prelink all use this, gcc itself too (e.g.
crtstuff.c).

Jakub


Re: Ping / update: RFA: replace #ifdef with if/#if for HAVE_ATTR_*

2012-10-30 Thread Richard Sandiford
I can't approve the whole thing of course, but I like the idea.
However...

Joern Rennecke  writes:
> +@deftypevr {Target Hook} bool TARGET_HAVE_CC0
> +@deftypevrx {Target Hook} {bool} TARGET_AUTO_INC_DEC
> +@deftypevrx {Target Hook} {bool} TARGET_STACK_REGS
> +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_ENABLED
> +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_LENGTH
> +These flags are automatically generated;  you should not override them in 
> @file{tm.c}.
> +@end deftypevr

Unless this goes against something already discussed, I think it'd be
better to leave these out until there's a concerted attempt to use them
somewhere.  On its own this isn't even a partial transition. :-)

> +  /* We make an exception here to provide stub definitions for
> + insn_*_length* / get_attr_enabled functions.  */
> +  puts ("#if !HAVE_ATTR_length\n"
> + "extern int hook_int_rtx_0 (rtx);\n"
> + "#define insn_default_length hook_int_rtx_0\n"
> + "#define insn_min_length hook_int_rtx_0\n"
> + "#define insn_variable_length_p hook_int_rtx_0\n"
> + "#define insn_current_length hook_int_rtx_0\n"
> + "#include \"insn-addr.h\"\n"
> + "#endif\n"

I'd prefer defaults that call gcc_unreachable, rather than silently
return an arbitrary value.  That said,

> + "#if !HAVE_ATTR_enabled\n"
> + "extern int hook_int_rtx_1 (rtx);\n"
> + "#define get_attr_enabled hook_int_rtx_1\n"
> + "#endif\n");

I agree that 1 is a safe default here.

Richard


Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Peter Bergner
On Tue, 2012-10-30 at 19:55 +0100, Jakub Jelinek wrote:
> On Tue, Oct 30, 2012 at 01:43:33PM -0500, Peter Bergner wrote:
> > Commenting on Richard's question from the bugzilla:
> > 
> >   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53708#c10
> > 
> > I suppose if attribute((__aligned__)) truly does just set a minimum 
> > alignment
> > value (and the documentation seems to say that) and the compiler is free to
> > arbitrarily increase it, then the GLIBC code to scan the init_array needs to
> > be tolerant of null values in init_array.
> > 
> > Does everyone agree with that assessment?
> 
> I think the right test is for user supplied section name,
> this approach is so common that I think we have to support it.
> E.g. Linux kernel, glibc, prelink all use this, gcc itself too (e.g.
> crtstuff.c).

Ok, then I'll bootstrap and regtest your suggested change while we
wait for richi to comment.  I'm fine with whatever you and richi
decide is best.  The ObjC guys should probably test it though too.

I assume you think we should change the current trunk code as well?

Peter





[committed] A global default for SLOW_UNALIGNED_ACCESS

2012-10-30 Thread Richard Sandiford
This patch replaces three separate default definitions of
SLOW_UNALIGNED_ACCESS with a single global one.  Note that tm.texi
requires SLOW_UNALIGNED_ACCESS to be true if STRICT_ALIGNMENT.

Tested on x86_64-linux-gnu, powerpc64-linux-gnu and mipsisa64-elf.
Applied as obvious.

Richard


gcc/
* defaults.h (SLOW_UNALIGNED_ACCESS): Provide default definition.
* expmed.c (SLOW_UNALIGNED_ACCESS): Remove default definition.
* expr.c (SLOW_UNALIGNED_ACCESS): Likewise.
* lra-constraints.c (SLOW_UNALIGNED_ACCESS): Likewise.
(simplify_operand_subreg): Don't check STRICT_ALIGNMENT here.

Index: gcc/defaults.h
===
--- gcc/defaults.h  2012-08-02 21:10:06.0 +0100
+++ gcc/defaults.h  2012-10-28 10:30:47.340353996 +
@@ -1218,6 +1218,10 @@ #define MINIMUM_ALIGNMENT(EXP,MODE,ALIGN
 #define ATTRIBUTE_ALIGNED_VALUE BIGGEST_ALIGNMENT
 #endif
 
+#ifndef SLOW_UNALIGNED_ACCESS
+#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT
+#endif
+
 /* For most ports anything that evaluates to a constant symbolic
or integer value is acceptable as a constant address.  */
 #ifndef CONSTANT_ADDRESS_P
Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-28 10:25:12.0 +
+++ gcc/expmed.c2012-10-28 10:30:44.178354004 +
@@ -69,11 +69,6 @@ static rtx expand_sdiv_pow2 (enum machin
 /* Test whether a value is zero of a power of two.  */
 #define EXACT_POWER_OF_2_OR_ZERO_P(x) (((x) & ((x) - 1)) == 0)
 
-#ifndef SLOW_UNALIGNED_ACCESS
-#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT
-#endif
-
-
 /* Reduce conditional compilation elsewhere.  */
 #ifndef HAVE_insv
 #define HAVE_insv  0
Index: gcc/expr.c
===
--- gcc/expr.c  2012-10-25 10:08:06.0 +0100
+++ gcc/expr.c  2012-10-28 10:31:44.133353857 +
@@ -189,12 +189,6 @@ #define STORE_BY_PIECES_P(SIZE, ALIGN) \
   (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
< (unsigned int) MOVE_RATIO (optimize_insn_for_speed_p ()))
 #endif
-
-/* SLOW_UNALIGNED_ACCESS is nonzero if unaligned accesses are very slow.  */
-
-#ifndef SLOW_UNALIGNED_ACCESS
-#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT
-#endif
 
 /* This is run to set up which modes can be used
directly in memory and to initialize the block move optab.  It is run
Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c   2012-10-26 11:50:16.0 +0100
+++ gcc/lra-constraints.c   2012-10-28 10:32:02.499353813 +
@@ -1105,10 +1105,6 @@ process_addr_reg (rtx *loc, rtx *before,
   return true;
 }
 
-#ifndef SLOW_UNALIGNED_ACCESS
-#define SLOW_UNALIGNED_ACCESS(mode, align) 0
-#endif
-
 /* Make reloads for subreg in operand NOP with internal subreg mode
REG_MODE, add new reloads for further processing.  Return true if
any reload was generated.  */
@@ -1132,8 +1128,7 @@ simplify_operand_subreg (int nop, enum m
  address might violate the necessary alignment or the access might
  be slow.  So take this into consideration. */
   if ((MEM_P (reg)
-   && ((! STRICT_ALIGNMENT
-   && ! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)))
+   && (! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg))
   || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode)))
   || (REG_P (reg) && REGNO (reg) < FIRST_PSEUDO_REGISTER))
 {


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-10-30 Thread Jason Merrill

On 10/27/2012 09:16 PM, Sriraman Tallam wrote:

+ /* See if there's a match.   For functions that are multi-versioned,
+all the versions match.  */
  if (same_type_p (target_fn_type, static_fn_type (fn)))
-   matches = tree_cons (fn, NULL_TREE, matches);
+   {
+ matches = tree_cons (fn, NULL_TREE, matches);
+ /*If versioned, push all possible versions into a vector.  */
+ if (DECL_FUNCTION_VERSIONED (fn))
+   {
+ if (fn_ver_vec == NULL)
+  fn_ver_vec = VEC_alloc (tree, heap, 2);
+ VEC_safe_push (tree, heap, fn_ver_vec, fn);
+   }
+   }


Why do we need to keep both a list and vector of the matches?


+Call decls_match to make sure they are different because they are
+versioned.  */
+  if (DECL_FUNCTION_VERSIONED (fn))
+   {
+  for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match))
+   if (decls_match (fn, TREE_PURPOSE (match)))
+ break;
+   }


What if you have multiple matches that aren't all versions of the same 
function?


Why would it be a problem to have two separate declarations of the same 
function?



+  dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec);


Is the idea here that if you have some versions declared, then a call, 
then more versions declared, then another call, you will call two 
different dispatchers, where the first one will only dispatch to the 
versions declared before the first call?  If not, why do we care about 
the set of declarations at this point?



+  /* Mark this functio to be output.  */
+  node->local.finalized = 1;


Missing 'n' in "function".


@@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl)
  else
break;
}
-  name = DECL_ASSEMBLER_NAME (decl);
+  if (TREE_CODE (decl) == FUNCTION_DECL
+&& DECL_FUNCTION_VERSIONED (decl))
+  name = DECL_NAME (decl);


This would mean that f in the global namespace and f in namespace foo 
would end up in the same comdat group.  Why do we need special handling 
here at all?



 dump_function_name (tree t, int flags)
 {
-  tree name = DECL_NAME (t);
+  tree name;

+  /* For function versions, use the assembler name as the decl name is
+ the same for all versions.  */
+  if (TREE_CODE (t) == FUNCTION_DECL
+  && DECL_FUNCTION_VERSIONED (t))
+name = DECL_ASSEMBLER_NAME (t);


This shouldn't be necessary; we should print the target attribute when 
printing the function declaration.



+Also, mark this function as needed if it is marked inline but
+is a multi-versioned function.  */
+  if (((flag_keep_inline_functions
+   || DECL_FUNCTION_VERSIONED (fn))


This should be marked as needed by the code that builds the dispatcher.


+  /* For calls to a multi-versioned function, overload resolution
+ returns the function with the highest target priority, that is,
+ the version that will checked for dispatching first.  If this
+ version is inlinable, a direct call to this version can be made
+ otherwise the call should go through the dispatcher.  */


I'm a bit confused why people would want both dispatched calls and 
non-dispatched inlining; I would expect that if a function can be 
compiled differently enough on newer hardware to make versioning 
worthwhile, that would be a larger difference than the call overhead.



+  if (DECL_FUNCTION_VERSIONED (fn)
+  && !targetm.target_option.can_inline_p (current_function_decl, fn))
+{
+  struct cgraph_node *dispatcher_node = NULL;
+  fn = get_function_version_dispatcher (fn);
+  if (fn == NULL)
+   return NULL;
+  dispatcher_node = cgraph_get_create_node (fn);
+  gcc_assert (dispatcher_node != NULL);
+  /* Mark this function to be output.  */
+  dispatcher_node->local.finalized = 1;
+}


Why do you need to mark this here?  If you generate a call to the 
dispatcher, cgraph should mark it to be output automatically.



+  /* For candidates of a multi-versioned function,  make the version with
+ the highest priority win.  This version will be checked for dispatching
+ first.  If this version can be inlined into the caller, the front-end
+ will simply make a direct call to this function.  */


This is still too high in joust.  I believe I said before that this code 
should come just above


   /* If the two function declarations represent the same function 
(this can
  happen with declarations in multiple scopes and arg-dependent 
lookup),
  arbitrarily choose one.  But first make sure the default args 
we're

  using match.  */


+  /* For multiversioned functions, aggregate all the versions here for
+ generating the dispatcher body later if necessary.  Check to see if
+ the dispatcher is already generated to avoid doing this more than
+ once.  */


Thi

[0/8] Preparing for insv and ext(z)v optabs

2012-10-30 Thread Richard Sandiford
I'm finishing off some patches to allow insv, extv and extzv to be
defined as normal direct optabs (such as "insvsi" and "insvdi" rather
than just "insv").  This series of patches does some groundwork to make
that possible.

The patches are not supposed to change the generated code.  I checked
that the assembly output for a set of gcc .ii files didn't change
for x86_64-linux-gnu, powerpc64-linux-gnu and mips64-linux-gnu.
Also regression-tested on x86_64-linux-gnu, powerpc64-linux-gnu
and mips64-elf.

Richard


[1/8] Remove redundant BLKmode test

2012-10-30 Thread Richard Sandiford
This patch removes what I believe is a redundant check in store_bit_field_1
for whether the value to insert (i.e. the rhs) has BLKmode.  We shouldn't
see BLKmode values here, and even if we did, the only effect of the test
is to fall through to store_fixed_bit_field, which can't handle BLKmode
either.  Specifically, store_fixed_bit_field would call:

  if (GET_MODE (value) != mode)
value = convert_to_mode (mode, value, 1);

and convert_to_mode ICEs on BLKmode values.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (store_bit_field_1): Remove test for BLKmode values.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-28 10:40:22.533352589 +
+++ gcc/expmed.c2012-10-28 10:40:23.119352588 +
@@ -670,7 +670,6 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   enum machine_mode op_mode = mode_for_extraction (EP_insv, 3);
   if (HAVE_insv
-  && GET_MODE (value) != BLKmode
   && bitsize > 0
   && GET_MODE_BITSIZE (op_mode) >= bitsize
   /* Do not use insv for volatile bitfields when


[2/8] Remove redundant MAX_MACHINE_MODE tests

2012-10-30 Thread Richard Sandiford
extract_bit_field_1 has a block beginning:

  /* If OP0 is a memory, try copying it to a register and seeing if a
 cheap register alternative is available.  */
  if (ext_mode != MAX_MACHINE_MODE && MEM_P (op0))
{

and within it there are tests for whether ext_mode != MAX_MACHINE_MODE.
This patch removes them.

store_bit_field_1 has a similar block, but it uses:

  /* If OP0 is a memory, try copying it to a register and seeing if a
 cheap register alternative is available.  */
  if (HAVE_insv && MEM_P (op0))
{

However, by definition, HAVE_insv is equivalent to op_mode != MAX_MACHINE_MODE,
so this patch changes it to that form for consistency.  It's then obvious
that the corresponding op_mode != MAX_MACHINE_MODE tests are redundant too.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (store_bit_field_1): Use OP_MODE to check whether an
insv pattern is available.  Remove redundant checks for OP_MODE
being MAX_MACHINE_MODE.
(extract_bit_field_1): Remove redundant checks for EXT_MODE being
MAX_MACHINE_MODE.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-28 10:54:24.0 +
+++ gcc/expmed.c2012-10-28 10:54:53.715350457 +
@@ -669,7 +669,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
  in a word.  */
 
   enum machine_mode op_mode = mode_for_extraction (EP_insv, 3);
-  if (HAVE_insv
+  if (op_mode != MAX_MACHINE_MODE
   && bitsize > 0
   && GET_MODE_BITSIZE (op_mode) >= bitsize
   /* Do not use insv for volatile bitfields when
@@ -788,7 +788,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   /* If OP0 is a memory, try copying it to a register and seeing if a
  cheap register alternative is available.  */
-  if (HAVE_insv && MEM_P (op0))
+  if (op_mode != MAX_MACHINE_MODE && MEM_P (op0))
 {
   enum machine_mode bestmode;
   unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
@@ -803,13 +803,10 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   if (GET_MODE (op0) == BLKmode
  || GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits
- || (op_mode != MAX_MACHINE_MODE
- && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
+ || GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode))
bestmode = get_best_mode (bitsize, bitnum,
  bitregion_start, bitregion_end,
- MEM_ALIGN (op0),
- (op_mode == MAX_MACHINE_MODE
-  ? VOIDmode : op_mode),
+ MEM_ALIGN (op0), op_mode,
  MEM_VOLATILE_P (op0));
   else
bestmode = GET_MODE (op0);
@@ -1597,12 +1594,9 @@ extract_bit_field_1 (rtx str_rtx, unsign
 smallest mode containing the field.  */
 
   if (GET_MODE (op0) == BLKmode
- || (ext_mode != MAX_MACHINE_MODE
- && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
+ || GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode))
bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
- (ext_mode == MAX_MACHINE_MODE
-  ? VOIDmode : ext_mode),
- MEM_VOLATILE_P (op0));
+ ext_mode, MEM_VOLATILE_P (op0));
   else
bestmode = GET_MODE (op0);
 


[3/8] Split out insv and ext(z)v code

2012-10-30 Thread Richard Sandiford
This patch splits out the code to handle insv and ext(z)v from
store_bit_field_1 and extract_bit_field_1 respectively.  I removed
"x" prefixes from some of the variables and tried to make the placement
of the REG and SUBREG handling more consistent, but there are no
behavioural changes.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (store_bit_field_using_insv): New function,
split out from...
(store_bit_field_1): ...here.
(extract_bit_field_using_extv): New function, split out from...
(extract_bit_field_1): ...here.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-29 12:56:45.260327155 +
+++ gcc/expmed.c2012-10-29 13:11:08.244325044 +
@@ -404,6 +404,120 @@ lowpart_bit_field_p (unsigned HOST_WIDE_
 return bitnum % BITS_PER_WORD == 0;
 }
 
+/* Try to use an insv pattern to store VALUE into a field of OP0.
+   OP_MODE is the mode of the insertion and BITSIZE and BITNUM are
+   as for store_bit_field.  */
+
+static bool
+store_bit_field_using_insv (rtx op0, unsigned HOST_WIDE_INT bitsize,
+   unsigned HOST_WIDE_INT bitnum, rtx value,
+   enum machine_mode op_mode)
+{
+  struct expand_operand ops[4];
+  rtx value1;
+  rtx xop0 = op0;
+  rtx last = get_last_insn ();
+  bool copy_back = false;
+
+  unsigned int unit = GET_MODE_BITSIZE (op_mode);
+  if (bitsize == 0 || bitsize > unit)
+return false;
+
+  if (MEM_P (xop0))
+{
+  /* Get a reference to the first byte of the field.  */
+  xop0 = adjust_bitfield_address (xop0, byte_mode, bitnum / BITS_PER_UNIT);
+  bitnum %= BITS_PER_UNIT;
+}
+  else
+{
+  /* Convert from counting within OP0 to counting in OP_MODE.  */
+  if (BYTES_BIG_ENDIAN)
+   bitnum += unit - GET_MODE_BITSIZE (GET_MODE (op0));
+
+  /* If xop0 is a register, we need it in OP_MODE
+to make it acceptable to the format of insv.  */
+  if (GET_CODE (xop0) == SUBREG)
+   /* We can't just change the mode, because this might clobber op0,
+  and we will need the original value of op0 if insv fails.  */
+   xop0 = gen_rtx_SUBREG (op_mode, SUBREG_REG (xop0), SUBREG_BYTE (xop0));
+  if (REG_P (xop0) && GET_MODE (xop0) != op_mode)
+   xop0 = gen_lowpart_SUBREG (op_mode, xop0);
+}
+
+  /* If the destination is a paradoxical subreg such that we need a
+ truncate to the inner mode, perform the insertion on a temporary and
+ truncate the result to the original destination.  Note that we can't
+ just truncate the paradoxical subreg as (truncate:N (subreg:W (reg:N
+ X) 0)) is (reg:N X).  */
+  if (GET_CODE (xop0) == SUBREG
+  && REG_P (SUBREG_REG (xop0))
+  && !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (SUBREG_REG (xop0)),
+op_mode))
+{
+  rtx tem = gen_reg_rtx (op_mode);
+  emit_move_insn (tem, xop0);
+  xop0 = tem;
+  copy_back = true;
+}
+
+  /* If BITS_BIG_ENDIAN is zero on a BYTES_BIG_ENDIAN machine, we count
+ "backwards" from the size of the unit we are inserting into.
+ Otherwise, we count bits from the most significant on a
+ BYTES/BITS_BIG_ENDIAN machine.  */
+
+  if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+bitnum = unit - bitsize - bitnum;
+
+  /* Convert VALUE to op_mode (which insv insn wants) in VALUE1.  */
+  value1 = value;
+  if (GET_MODE (value) != op_mode)
+{
+  if (GET_MODE_BITSIZE (GET_MODE (value)) >= bitsize)
+   {
+ /* Optimization: Don't bother really extending VALUE
+if it has all the bits we will actually use.  However,
+if we must narrow it, be sure we do it correctly.  */
+
+ if (GET_MODE_SIZE (GET_MODE (value)) < GET_MODE_SIZE (op_mode))
+   {
+ rtx tmp;
+
+ tmp = simplify_subreg (op_mode, value1, GET_MODE (value), 0);
+ if (! tmp)
+   tmp = simplify_gen_subreg (op_mode,
+  force_reg (GET_MODE (value),
+ value1),
+  GET_MODE (value), 0);
+ value1 = tmp;
+   }
+ else
+   value1 = gen_lowpart (op_mode, value1);
+   }
+  else if (CONST_INT_P (value))
+   value1 = gen_int_mode (INTVAL (value), op_mode);
+  else
+   /* Parse phase is supposed to make VALUE's data type
+  match that of the component reference, which is a type
+  at least as wide as the field; so VALUE should have
+  a mode that corresponds to that type.  */
+   gcc_assert (CONSTANT_P (value));
+}
+
+  create_fixed_operand (&ops[0], xop0);
+  create_integer_operand (&ops[1], bitsize);
+  create_integer_operand (&ops[2], bitnum);
+  create_input_operand (&ops[3], value1, op_mode);
+  if (maybe_expand_insn (CO

[4/8] Separate reg and mem uses of insv and ext(z)v

2012-10-30 Thread Richard Sandiford
This patch simply separates out the MEM and non-MEM insv and ext(z)v cases.
On it's own, it's probably a wash whether this is an improvement or not,
but it makes the optabs patches much easier.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (store_bit_field_1): Move generation of MEM insvs
to the MEM_P block.
(extract_bit_field_1): Likewise extvs and extzvs.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-28 10:55:22.754350385 +
+++ gcc/expmed.c2012-10-28 10:55:29.249350370 +
@@ -784,16 +784,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   enum machine_mode op_mode = mode_for_extraction (EP_insv, 3);
   if (op_mode != MAX_MACHINE_MODE
-  /* Do not use insv for volatile bitfields when
- -fstrict-volatile-bitfields is in effect.  */
-  && !(MEM_P (op0) && MEM_VOLATILE_P (op0)
-  && flag_strict_volatile_bitfields > 0)
-  /* Do not use insv if the bit region is restricted and
-op_mode integer at offset doesn't fit into the
-restricted region.  */
-  && !(MEM_P (op0) && bitregion_end
-  && bitnum - (bitnum % BITS_PER_UNIT) + GET_MODE_BITSIZE (op_mode)
- > bitregion_end + 1)
+  && !MEM_P (op0)
   && store_bit_field_using_insv (op0, bitsize, bitnum, value, op_mode))
 return true;
 
@@ -804,6 +795,18 @@ store_bit_field_1 (rtx str_rtx, unsigned
   enum machine_mode bestmode;
   unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
 
+  /* Do not use insv for volatile bitfields when
+ -fstrict-volatile-bitfields is in effect.  */
+  if (!(MEM_VOLATILE_P (op0) && flag_strict_volatile_bitfields > 0)
+ /* Do not use insv if the bit region is restricted and
+an op_mode integer doesn't fit into the restricted region.  */
+ && !(bitregion_end
+  && (bitnum - (bitnum % BITS_PER_UNIT)
+  + GET_MODE_BITSIZE (op_mode)
+  > bitregion_end + 1))
+ && store_bit_field_using_insv (op0, bitsize, bitnum, value, op_mode))
+   return true;
+
   if (bitregion_end)
maxbits = bitregion_end - bitregion_start + 1;
 
@@ -1594,11 +1597,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
  If OP0 is a register, it too fits within a word.  */
 
   ext_mode = mode_for_extraction (unsignedp ? EP_extzv : EP_extv, 0);
-  if (ext_mode != MAX_MACHINE_MODE
-  /* Do not use extv/extzv for volatile bitfields when
- -fstrict-volatile-bitfields is in effect.  */
-  && !(MEM_P (op0) && MEM_VOLATILE_P (op0)
-  && flag_strict_volatile_bitfields > 0))
+  if (ext_mode != MAX_MACHINE_MODE && !MEM_P (op0))
 {
   rtx result = extract_bit_field_using_extv (op0, bitsize, bitnum,
 unsignedp, target, mode,
@@ -1613,6 +1612,17 @@ extract_bit_field_1 (rtx str_rtx, unsign
 {
   enum machine_mode bestmode;
 
+  /* Do not use extv/extzv for volatile bitfields when
+ -fstrict-volatile-bitfields is in effect.  */
+  if (!(MEM_VOLATILE_P (op0) && flag_strict_volatile_bitfields > 0))
+   {
+ rtx result = extract_bit_field_using_extv (op0, bitsize, bitnum,
+unsignedp, target, mode,
+tmode, ext_mode);
+ if (result)
+   return result;
+   }
+
   /* Get the mode to use for inserting into this field.  If
 OP0 is BLKmode, get the smallest mode consistent with the
 alignment. If OP0 is a non-BLKmode object that is no


Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures

2012-10-30 Thread Steven Bosscher
On Tue, Oct 30, 2012 at 5:59 PM, Teresa Johnson wrote:
> I will try testing your patch on top of mine with our fdo benchmarks.

Thanks. But you should expect a lot of errors, hopefully you can make
something out of it for Bugzilla.

> For the others on the cc list, you may need to include my patch as
> well for testing. Without it, -freorder-blocks-and-partition was DOA
> for me. For my patch, see
> http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html

Ah, the fate of a pass that isn't enabled by default. From what I can
tell, looking at this code the past few hours, it's hopelessly broken
at the moment:

* It doesn't work with cfglayout mode, even though it is in between
pass_into_cfg_layout_mode and pass_outof_cfg_layout_mode. Coming out
of cfglayout mode

* Coming out of cfglayout mode, fixup_reorder_chain adds
REG_CROSSING_JUMP -- but so does force_nonfallthru_and_redirect called
just before it. So we end up with >1 such note, or with notes on jumps
that shouldn't have them.

* The scheduler doesn't respect the partitioning at all. remove_notes
happily removes the section split note.

* etc.


This pass may need some major work to be in good order again.

Ciao!
Steven


[5/8] Add narrow_bit_field_mem

2012-10-30 Thread Richard Sandiford
This patch splits out a fairly common operation: that of narrowing a MEM
to a particular mode and adjusting the bit number accordingly.

I've kept with "bit_field" rather than "bitfield" for consistency with
the callers, although we do have "bitfield" in "adjust_bitfield_address".

Tested as described in the covering note.  OK to install?

Richard


gcc/
* expmed.c (narrow_bit_field_mem): New function.
(store_bit_field_using_insv, store_bit_field_1, store_fixed_bit_field)
(extract_bit_field_1): Use it.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-30 19:25:44.797368678 +
+++ gcc/expmed.c2012-10-30 19:25:47.730368671 +
@@ -387,6 +387,23 @@ mode_for_extraction (enum extraction_pat
   return data->operand[opno].mode;
 }
 
+/* Adjust bitfield memory MEM so that it points to the first unit of
+   mode MODE that contains the bitfield at bit position BITNUM.
+   Set NEW_BITNUM to the bit position of the field within the
+   new memory.  */
+
+static rtx
+narrow_bit_field_mem (rtx mem, enum machine_mode mode,
+ unsigned HOST_WIDE_INT bitnum,
+ unsigned HOST_WIDE_INT &new_bitnum)
+{
+  unsigned int unit = GET_MODE_BITSIZE (mode);
+  unsigned HOST_WIDE_INT offset = bitnum / unit * GET_MODE_SIZE (mode);
+  mem = adjust_bitfield_address (mem, mode, offset);
+  new_bitnum = bitnum % unit;
+  return mem;
+}
+
 /* Return true if a bitfield of size BITSIZE at bit number BITNUM within
a structure of mode STRUCT_MODE represents a lowpart subreg.   The subreg
offset is then BITNUM / BITS_PER_UNIT.  */
@@ -424,11 +441,8 @@ store_bit_field_using_insv (rtx op0, uns
 return false;
 
   if (MEM_P (xop0))
-{
-  /* Get a reference to the first byte of the field.  */
-  xop0 = adjust_bitfield_address (xop0, byte_mode, bitnum / BITS_PER_UNIT);
-  bitnum %= BITS_PER_UNIT;
-}
+/* Get a reference to the first byte of the field.  */
+xop0 = narrow_bit_field_mem (xop0, byte_mode, bitnum, bitnum);
   else
 {
   /* Convert from counting within OP0 to counting in OP_MODE.  */
@@ -831,18 +845,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
   && GET_MODE_BITSIZE (bestmode) > MEM_ALIGN (op0)))
{
  rtx last, tempreg, xop0;
- unsigned int unit;
- unsigned HOST_WIDE_INT offset, bitpos;
+ unsigned HOST_WIDE_INT bitpos;
 
  last = get_last_insn ();
 
  /* Adjust address to point to the containing unit of
 that mode.  Compute the offset as a multiple of this unit,
 counting in bytes.  */
- unit = GET_MODE_BITSIZE (bestmode);
- offset = (bitnum / unit) * GET_MODE_SIZE (bestmode);
- bitpos = bitnum % unit;
- xop0 = adjust_bitfield_address (op0, bestmode, offset);
+ xop0 = narrow_bit_field_mem (op0, bestmode, bitnum, bitpos);
 
  /* Fetch that unit, store the bitfield in it, then store
 the unit.  */
@@ -975,9 +985,7 @@ store_fixed_bit_field (rtx op0, unsigned
  return;
}
 
-  HOST_WIDE_INT bit_offset = bitnum - bitnum % GET_MODE_BITSIZE (mode);
-  op0 = adjust_bitfield_address (op0, mode, bit_offset / BITS_PER_UNIT);
-  bitnum -= bit_offset;
+  op0 = narrow_bit_field_mem (op0, mode, bitnum, bitnum);
 }
 
   mode = GET_MODE (op0);
@@ -1246,11 +1254,8 @@ extract_bit_field_using_extv (rtx op0, u
 return NULL_RTX;
 
   if (MEM_P (op0))
-{
-  /* Get a reference to the first byte of the field.  */
-  op0 = adjust_bitfield_address (op0, byte_mode, bitnum / BITS_PER_UNIT);
-  bitnum %= BITS_PER_UNIT;
-}
+/* Get a reference to the first byte of the field.  */
+op0 = narrow_bit_field_mem (op0, byte_mode, bitnum, bitnum);
   else
 {
   /* Convert from counting within OP0 to counting in EXT_MODE.  */
@@ -1640,23 +1645,19 @@ extract_bit_field_1 (rtx str_rtx, unsign
  && !(SLOW_UNALIGNED_ACCESS (bestmode, MEM_ALIGN (op0))
   && GET_MODE_BITSIZE (bestmode) > MEM_ALIGN (op0)))
{
- unsigned HOST_WIDE_INT offset, bitpos;
-
- /* Compute the offset as a multiple of this unit,
-counting in bytes.  */
- unsigned int unit = GET_MODE_BITSIZE (bestmode);
- offset = (bitnum / unit) * GET_MODE_SIZE (bestmode);
- bitpos = bitnum % unit;
+ unsigned HOST_WIDE_INT bitpos;
+ rtx xop0 = narrow_bit_field_mem (op0, bestmode, bitnum, bitpos);
 
- /* Make sure the register is big enough for the whole field.  */
- if (bitpos + bitsize <= unit)
+ /* Make sure the register is big enough for the whole field.
+(It might not be if bestmode == GET_MODE (op0) and the input
+code was invalid.)  */
+ if (bitpos + bitsize <= GET_MODE_BITSIZE (bestmode))
{
- rtx last, result, xop0;
+ rtx last,

[6/8] Simplify make_extraction

2012-10-30 Thread Richard Sandiford
On a change of tack, this tackles some redundant code in combine.
It has code to convert a variable bit position to the mode required
by the bit position operand to insv, extv or extzv:

[A]
  else if (pos_rtx != 0
   && GET_MODE_SIZE (pos_mode) < GET_MODE_SIZE (GET_MODE (pos_rtx)))
pos_rtx = gen_lowpart (pos_mode, pos_rtx);

However, this is protected by an earlier:

[B]
  if (pos_rtx && GET_MODE (pos_rtx) != VOIDmode
  && GET_MODE_SIZE (pos_mode) < GET_MODE_SIZE (GET_MODE (pos_rtx)))
pos_mode = GET_MODE (pos_rtx);

so [B] makes [A] redundant.  That leaves this block:

  /* Adjust mode of POS_RTX, if needed.  If we want a wider mode, we
 have to zero extend.  Otherwise, we can just use a SUBREG.  */
  if (pos_rtx != 0
  && GET_MODE_SIZE (pos_mode) > GET_MODE_SIZE (GET_MODE (pos_rtx)))
{

as the only use of pos_mode, so [B] can go too.

Similarly, the memory case has:

[C]
  /* If we have to change the mode of memory and cannot, the desired mode
 is EXTRACTION_MODE.  */
  if (inner_mode != wanted_inner_mode
  && (mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner))
  || MEM_VOLATILE_P (inner)
  || pos_rtx))
wanted_inner_mode = extraction_mode;

but those same conditions are repeated in the code that actually
applies wanted_inner_mode:

  /* If INNER has a wider mode, and this is a constant extraction, try to
 make it smaller and adjust the byte to point to the byte containing
 the value.  */
  if (wanted_inner_mode != VOIDmode
  && inner_mode != wanted_inner_mode
  && ! pos_rtx
  && GET_MODE_SIZE (wanted_inner_mode) < GET_MODE_SIZE (is_mode)
  && MEM_P (inner)
  && ! mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner))
  && ! MEM_VOLATILE_P (inner))
{

so [C] too is unnecessary.

Tested as described in the covering note.  OK to install?

Richard


gcc/
* combine.c (make_extraction): Remove dead wanted_inner_mode-
and pos_rtx-related code.

Index: gcc/combine.c
===
--- gcc/combine.c   2012-10-27 22:22:45.0 +0100
+++ gcc/combine.c   2012-10-27 22:32:26.811370582 +0100
@@ -7211,10 +7211,6 @@ make_extraction (enum machine_mode mode,
   && GET_MODE_SIZE (extraction_mode) < GET_MODE_SIZE (mode))
 extraction_mode = mode;
 
-  if (pos_rtx && GET_MODE (pos_rtx) != VOIDmode
-  && GET_MODE_SIZE (pos_mode) < GET_MODE_SIZE (GET_MODE (pos_rtx)))
-pos_mode = GET_MODE (pos_rtx);
-
   /* If this is not from memory, the desired mode is the preferred mode
  for an extraction pattern's first input operand, or word_mode if there
  is none.  */
@@ -7231,14 +7227,6 @@ make_extraction (enum machine_mode mode,
  wanted_inner_mode = GET_MODE_WIDER_MODE (wanted_inner_mode);
  gcc_assert (wanted_inner_mode != VOIDmode);
}
-
-  /* If we have to change the mode of memory and cannot, the desired mode
-is EXTRACTION_MODE.  */
-  if (inner_mode != wanted_inner_mode
- && (mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner))
- || MEM_VOLATILE_P (inner)
- || pos_rtx))
-   wanted_inner_mode = extraction_mode;
 }
 
   orig_pos = pos;
@@ -7359,9 +7347,6 @@ make_extraction (enum machine_mode mode,
}
   pos_rtx = temp;
 }
-  else if (pos_rtx != 0
-  && GET_MODE_SIZE (pos_mode) < GET_MODE_SIZE (GET_MODE (pos_rtx)))
-pos_rtx = gen_lowpart (pos_mode, pos_rtx);
 
   /* Make POS_RTX unless we already have it and it is correct.  If we don't
  have a POS_RTX but we do have an ORIG_POS_RTX, the latter must


Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures

2012-10-30 Thread Xinliang David Li
On Tue, Oct 30, 2012 at 12:25 PM, Steven Bosscher  wrote:
> On Tue, Oct 30, 2012 at 5:59 PM, Teresa Johnson wrote:
>> I will try testing your patch on top of mine with our fdo benchmarks.
>
> Thanks. But you should expect a lot of errors, hopefully you can make
> something out of it for Bugzilla.
>
>> For the others on the cc list, you may need to include my patch as
>> well for testing. Without it, -freorder-blocks-and-partition was DOA
>> for me. For my patch, see
>> http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html
>
> Ah, the fate of a pass that isn't enabled by default.

I see only 5 test cases under tree-prof for function splitting. More
test cases are needed.

David


> From what I can
> tell, looking at this code the past few hours, it's hopelessly broken
> at the moment:
>
> * It doesn't work with cfglayout mode, even though it is in between
> pass_into_cfg_layout_mode and pass_outof_cfg_layout_mode. Coming out
> of cfglayout mode
>
> * Coming out of cfglayout mode, fixup_reorder_chain adds
> REG_CROSSING_JUMP -- but so does force_nonfallthru_and_redirect called
> just before it. So we end up with >1 such note, or with notes on jumps
> that shouldn't have them.
>
> * The scheduler doesn't respect the partitioning at all. remove_notes
> happily removes the section split note.
>
> * etc.
>
>
> This pass may need some major work to be in good order again.
>
> Ciao!
> Steven


[7/8] BITS_BIG_ENDIAN vs. (zero_extract (const_int ...) ...)

2012-10-30 Thread Richard Sandiford
Combine tries to optimise comparisons involving:

(zero_extract (const_int X)
  (const_int 1)
  (var Y))

and so on BITS_BIG_ENDIAN targets it tries gamely to work out what mode
X actually has.  At the moment it tries reading the mode from operand 1
of extzv, but that doesn't feel right, since we never use extzv itself
with this combination of operands.  (We only use it with combinations
in which the first zero_extract operand is variable and the third is
constant.)  And extzv isn't necessarily a good indicator of what a
matched zero_extract does, since targets often have matchable zero_extract
insns for more than one mode.  E.g. powerpc (a BITS_BIG_ENDIAN target)
has both SImode and DImode patterns.

In practice, all BITS_BIG_ENDIAN targets that have an extzv pattern
either have an explicit word_mode operand (e.g. m68k) or leave it void,
which make_for_extraction treats as word_mode.  Since word_mode is also
the default assumption when no extzv pattern is defined, I think it would
be more robust to assume/require word_mode across the board (as much as
anything can be called robust in this sort of situation).

Tested as described in the covering note.  OK to install?

Richard


gcc/
* combine.c (simplify_comparison): If BITS_BIG_ENDIAN, always assume
that zero_extracts of const_ints are doing word-sized extractions.

Index: gcc/combine.c
===
--- gcc/combine.c   2012-10-29 14:14:36.371315725 +
+++ gcc/combine.c   2012-10-29 14:29:26.800313546 +
@@ -11154,17 +11154,7 @@ simplify_comparison (enum rtx_code code,
  && (i = exact_log2 (UINTVAL (XEXP (op0, 0 >= 0)
{
  if (BITS_BIG_ENDIAN)
-   {
- enum machine_mode new_mode
-   = mode_for_extraction (EP_extzv, 1);
- if (new_mode == MAX_MACHINE_MODE)
-   i = BITS_PER_WORD - 1 - i;
- else
-   {
- mode = new_mode;
- i = (GET_MODE_PRECISION (mode) - 1 - i);
-   }
-   }
+   i = BITS_PER_WORD - 1 - i;
 
  op0 = XEXP (op0, 2);
  op1 = GEN_INT (i);


Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Jakub Jelinek
On Tue, Oct 30, 2012 at 02:03:44PM -0500, Peter Bergner wrote:
> Ok, then I'll bootstrap and regtest your suggested change while we
> wait for richi to comment.  I'm fine with whatever you and richi
> decide is best.  The ObjC guys should probably test it though too.
> 
> I assume you think we should change the current trunk code as well?

Well, I haven't looked at the ObjC failures (guess they are darwin only
anyway), so have no idea whether those set section name or not.
So, if my proposed test instead of the trunk one doesn't work for darwin,
perhaps it could be DECL_PRESERVED_P (decl) || (DECL_SECTION_NAME (decl) &&
!...).  I think DECL_USER_ALIGN test is undesirable for that though, it is
just fine to increase alignment of anything, after all, it is still aligned
properly then.

Jakub


[8/8] Rename some powerpc patterns

2012-10-30 Thread Richard Sandiford
The optabs patches that I'm working on treat patterns with names like
"insvsi" and "insvdi" as part of the public target interface.
Those names clash with some existing rs6000.md patterns, so this patch adds
some _internal suffixes (a bit like fix_truncsi2_internal, for example).

AFAICT there are no clashes in other ports.

Regression-tested on powerpc64-linux-gnu.  I also checked that there
were no changes in assembly output for a set of gcc .ii files.
OK to install?

Richard


gcc/
* config/rs6000/rs6000.md (insvsi, insvdi, extvsi, extvdi): Rename to...
(insvsi_internal, insvdi_internal, extvsi_internal)
(extvdi_internal): ...this.
(insv, extv): Update accordingly.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md 2012-10-29 09:13:00.0 +
+++ gcc/config/rs6000/rs6000.md 2012-10-29 09:50:28.846354500 +
@@ -3126,13 +3126,15 @@ (define_expand "insv"
 FAIL;
 
   if (TARGET_POWERPC64 && GET_MODE (operands[0]) == DImode)
-emit_insn (gen_insvdi (operands[0], operands[1], operands[2], 
operands[3]));
+emit_insn (gen_insvdi_internal (operands[0], operands[1], operands[2],
+   operands[3]));
   else
-emit_insn (gen_insvsi (operands[0], operands[1], operands[2], 
operands[3]));
+emit_insn (gen_insvsi_internal (operands[0], operands[1], operands[2],
+   operands[3]));
   DONE;
 }")
 
-(define_insn "insvsi"
+(define_insn "insvsi_internal"
   [(set (zero_extract:SI (match_operand:SI 0 "gpc_reg_operand" "+r")
 (match_operand:SI 1 "const_int_operand" "i")
 (match_operand:SI 2 "const_int_operand" "i"))
@@ -3267,7 +3269,7 @@ (define_insn "*insvsi_internal6"
 }"
   [(set_attr "type" "insert_word")])
 
-(define_insn "insvdi"
+(define_insn "insvdi_internal"
   [(set (zero_extract:DI (match_operand:DI 0 "gpc_reg_operand" "+r")
 (match_operand:SI 1 "const_int_operand" "i")
 (match_operand:SI 2 "const_int_operand" "i"))
@@ -3339,13 +3341,15 @@ (define_expand "extzv"
 FAIL;
 
   if (TARGET_POWERPC64 && GET_MODE (operands[1]) == DImode)
-emit_insn (gen_extzvdi (operands[0], operands[1], operands[2], 
operands[3]));
+emit_insn (gen_extzvdi_internal (operands[0], operands[1], operands[2],
+operands[3]));
   else
-emit_insn (gen_extzvsi (operands[0], operands[1], operands[2], 
operands[3]));
+emit_insn (gen_extzvsi_internal (operands[0], operands[1], operands[2],
+operands[3]));
   DONE;
 }")
 
-(define_insn "extzvsi"
+(define_insn "extzvsi_internal"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
(zero_extract:SI (match_operand:SI 1 "gpc_reg_operand" "r")
 (match_operand:SI 2 "const_int_operand" "i")
@@ -3472,7 +3476,7 @@ (define_split
(const_int 0)))]
   "")
 
-(define_insn "extzvdi"
+(define_insn "extzvdi_internal"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
(zero_extract:DI (match_operand:DI 1 "gpc_reg_operand" "r")
 (match_operand:SI 2 "const_int_operand" "i")


Patch: add @direntry for gcov

2012-10-30 Thread Tom Tromey
This patch adds a @direntry for gcov.

I noticed that it was missing today, when I tried to find the gcov
manual from the info "dir" node.  Then I found out that I had filed PR
50899 for this ages ago.

Ok?

Tom

2012-10-30  Tom Tromey  

PR other/50899
* doc/gcc.texi: Add @direntry for gcov.

diff --git a/gcc/doc/gcc.texi b/gcc/doc/gcc.texi
index 0e167ba..b1c7324 100644
--- a/gcc/doc/gcc.texi
+++ b/gcc/doc/gcc.texi
@@ -42,7 +42,7 @@
 @copying
 Copyright @copyright{} 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997,
 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
-2010 Free Software Foundation, Inc.
+2010, 2012 Free Software Foundation, Inc.
 
 Permission is granted to copy, distribute and/or modify this document
 under the terms of the GNU Free Documentation License, Version 1.3 or
@@ -67,6 +67,7 @@ Texts being (a) (see below), and with the Back-Cover Texts 
being (b)
 @direntry
 * gcc: (gcc).  The GNU Compiler Collection.
 * g++: (gcc).  The GNU C++ compiler.
+* gcov: (gcc) Gcov.@command{gcov}---a test coverage program.
 @end direntry
 This file documents the use of the GNU compilers.
 @sp 1


Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708

2012-10-30 Thread Peter Bergner
On Tue, 2012-10-30 at 20:37 +0100, Jakub Jelinek wrote:
> On Tue, Oct 30, 2012 at 02:03:44PM -0500, Peter Bergner wrote:
> > Ok, then I'll bootstrap and regtest your suggested change while we
> > wait for richi to comment.  I'm fine with whatever you and richi
> > decide is best.  The ObjC guys should probably test it though too.
> > 
> > I assume you think we should change the current trunk code as well?
> 
> Well, I haven't looked at the ObjC failures (guess they are darwin only
> anyway), so have no idea whether those set section name or not.
> So, if my proposed test instead of the trunk one doesn't work for darwin,
> perhaps it could be DECL_PRESERVED_P (decl) || (DECL_SECTION_NAME (decl) &&
> !...).  I think DECL_USER_ALIGN test is undesirable for that though, it is
> just fine to increase alignment of anything, after all, it is still aligned
> properly then.

I can confirm it bootstraps and regtests without any errors...and it
fixes my problem.

Peter





  1   2   >