Re: GCC mini-summit - compiling for a particular architecture

2007-04-21 Thread Robert Dewar

Mike Stump wrote:

On Apr 20, 2007, at 6:42 PM, Robert Dewar wrote:

One possibility would be to have a -Om switch (or whatever) that
says "do all optimizations for this machine that help".


Ick, gross.  No.


Well OK, Ick, but below you recommend removingf the overly
pedantic rule. I agree with that, but the above is a
compromise suggestion if we can't remove the rule.

So, Mike, my question is, assuming we cannot remove the
rule what do you want to do

a) nothing
b) something like the above
c) something else, please specify



I must say the rule about all optimizations being the same on
all machines seems odd to me


I'd look at it this way, it isn't unreasonable to have cost metrics  
that are in fact different for each cpu and possible each tune choice  
that greatly effects _any_ codegen choice.  Sure, we can unroll the  
loops always on all targets, but, we can crank up the costs of extra  
instructions on chips where those costs are high, net result, almost  
no unrolling.  For chips where the costs are cheap and they need to  
exposed instructions to be able to optimizer further, trivially, the  
costs involved are totally different.  Net result, better code gen  
for each.


I do however think the concept of not allowing targets to set and  
unset optimization choices is, well, overly pedantic.




Re: GCC mini-summit - benchmarks

2007-04-21 Thread Jim Wilson

Kenneth Hoste wrote:
I'm not sure what 'tests' mean here... Are test cases being extracted 
from the SPEC CPU2006 sources? Or are you refering to the validity tests 
of the SPEC framework itself (to check whether the output generated by 
some binary conforms with their reference output)?


The claim is that SPEC CPU2006 has source code bugs that cause it to 
fail when compiled by gcc.  We weren't given a specific list of problem.


There are known problems with older SPEC benchmarks though.  For 
instance, vortex fails on some targets unless compiled with 
-fno-strict-aliasing.

--
Jim Wilson, GNU Tools Support, http://www.specifix.com


Re: Problem building gcc on Cygwin

2007-04-21 Thread Jim Wilson

Tom Dickens wrote:

 ../gcc/configure -enable-languages=c,c++,fortran.
make[1]: Leaving directory `/cygdrive/c/gcc-4.1.2/obj'


You ran the wrong configure script.  You must always run the toplevel 
configure script, not the one inside the gcc directory.


So instead of doing
  cd gcc-4.1.2
  mkdir obj
  cd obj
  ../gcc/configure
which will fail.  You should instead do
  mkdir obj
  cd obj
  ../gcc-4.1.2/configure
which will work.
--
Jim Wilson, GNU Tools Support, http://www.specifix.com


Re: A question on gimplifier

2007-04-21 Thread Jim Wilson

H. J. Lu wrote:

__builtin_ia32_vec_set_v2di will be expanded to
  [(set (match_operand:V2DI 0 "register_operand" "=x")
(vec_merge:V2DI
  (vec_duplicate:V2DI
(match_operand:DI 2 "nonimmediate_operand" "rm"))
  (match_operand:V2DI 1 "register_operand" "0")
  (match_operand:SI 3 "const_pow2_1_to_2_operand" "n")))]


Named rtl expanders aren't allowed to clobber their inputs.  You will 
need to generate a pseudo-reg temp in the expander, copy the first input 
to the temp, and then use the temp as the output/input argument.


There are probably lots of existing examples in the i386 *.md files to 
look at.  See for instance the reduc_splus_v4sf pattern in the sse.md file.

--
Jim Wilson, GNU Tools Support, http://www.specifix.com


Re: GCC mini-summit - compiling for a particular architecture

2007-04-21 Thread Laurent GUERBY
On Fri, 2007-04-20 at 19:28 -0400, Robert Dewar wrote:
> Steve Ellcey wrote:
> 
> > This seems unfortunate.  I was hoping I might be able to turn on loop
> > unrolling for IA64 at -O2 to improve performance.  I have only started
> > looking into this idea but it seems to help performance quite a bit,
> > though it is also increasing size quite a bit too so it may need some
> > modification of the unrolling parameters to make it practical.
> 
> To me it is obvious that optimizations are target dependent. For
> instance loop unrolling is really a totally different optimization
> on the ia64 as a result of the rotating registers.

My feeling is that it would be much more useful to have a more detailed
documentation on optimization flags in the GCC manual that at least
mention the type of source code and architectures where each
optimization option is interesting rather than to mess with new flags or
changing -On longstanding policies.

Look from what we're starting:

<<
@item -funroll-loops
@opindex funroll-loops
Unroll loops whose number of iterations can be determined at compile
time or upon entry to the loop.  @option{-funroll-loops} implies
@option{-frerun-cse-after-loop}.  This option makes code larger,
and may or may not make it run faster.

@item -funroll-all-loops
@opindex funroll-all-loops
Unroll all loops, even if their number of iterations is uncertain when
the loop is entered.  This usually makes programs run more slowly.
@option{-funroll-all-loops} implies the same options as
@option{-funroll-loops},
>>

It could gain a few more paragraphs written by knowledgeable people.
And expanding documentation doesn't introduce regressions :).

Laurent




Re: GCC mini-summit - compiling for a particular architecture

2007-04-21 Thread Mike Stump

On Apr 21, 2007, at 3:12 AM, Robert Dewar wrote:
So, Mike, my question is, assuming we cannot remove the rule what  
do you want to do


I think in the end, each situation is different and we have to find  
the best solution for each situation.  So, in that siprit, let's open  
a discussion for the exact case your thinking of.


Now, the closest I've come to -Om in the past would be -fast, which  
means, tune for spec.  :-)


How do you get the benefit of -fstrict-aliasing?

2007-04-21 Thread Bradley Lucier
I've decided to try to contribute modifications to the the C code  
that is generated by the Gambit Scheme->C compiler so that (a) it  
doesn't have any aliasing violations and (b) more aliasing  
distinctions can be made (the car and cdr of a pair don't overlap  
with the entries of a vector, etc.).  This was in response to a  
measured 20% speedup with some numerical code with -fstrict-aliasing  
instead of -fno-strict-aliasing, nearly all of which came because gcc  
then knew that stores to a vector of doubles didn't change the values  
of variables on the stack.


Part (a) is essentially a non-issue for user-written code, since the  
only aliasing problems of which I am aware are in the bignum library,  
so as a preliminary test I added -fstrict-aliasing to the gcc command  
line and reran the benchmark suite on a 2GHz G5.  To my surprise,  
while there were some improvements, the -fstrict-aliasing option led  
to slower code overall, in some cases quite severely (7.014 seconds  
to 11.794 seconds, for example), and, perhaps not surprisingly,  
compilation times were significantly longer.  This was true both with  
Apple's 4.0.1 and FSF 4.1.2.


So I'm wondering whether certain options have to be included on the  
command line to get the benefits of -fstrict-aliasing.  The current  
command line is


gcc -mcpu=970 -m64  -no-cpp-precomp -Wall -W -Wno-unused -O1 -fno- 
math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing - 
fwrapv -fexpensive-optimizations -fforce-addr -fpeephole2 -falign- 
jumps -falign-functions -fno-function-cse -ftree-copyrename -ftree- 
fre -ftree-dce -fregmove -fgcse-las -freorder-functions -fcaller- 
saves -fno-if-conversion2 -foptimize-sibling-calls -fcse-skip-blocks - 
funit-at-a-time -finline-functions -fomit-frame-pointer -fPIC -fno- 
common -bundle -flat_namespace -undefined suppress -fstrict-aliasing


where the optimizations between -fwrapv (which is no longer  
necessary, I should remove that) and -fstrict-aliasing were chosen by  
some experiments with genetic algorithms.


I didn't think that adding aliasing information could lead to worse  
code.  So I'm wondering how to use that aliasing information more  
effectively to get better code.


Brad


Re: How do you get the benefit of -fstrict-aliasing?

2007-04-21 Thread Andrew Pinski

On 4/21/07, Bradley Lucier <[EMAIL PROTECTED]> wrote:

I didn't think that adding aliasing information could lead to worse
code.  So I'm wondering how to use that aliasing information more
effectively to get better code.


What aliasing information could do is allow an optimization pass cause
register pressure which causes our current RA (register allocator) to
go crazy and make code worse.  This is true of any optimization even
one that takes into account register pressure (which actually the
wrong thing to do really).

Thanks,
Andrew Pinski


maybe_infinite_loop?

2007-04-21 Thread Mike Stump
We still have some lno bits in our tree.  We tried to remove them and  
found:


gzip +0.5%
vpr -0.4%
gcc -3.2%
mcf -0.3%
crafty +0.2%
parser +0.2%
perlbmk -2.2%
gap +0.2%
vortex -0.1%
bzip2 +1.9%
twolf -0.7%

on x86 (probably a core2 duo) in our 4.2 tree (with the rest of our  
local patches).  -3.2% means a 3.2% better codegen (roughly) with the  
lno bits.  I didn't rerun the numbers for mainline to see if they are  
still applicable.


Of all the LNO bits, the last major bits seems to be the below bit.   
I don't even know if it is responsible for the benefit we see.  I  
thought I'd mention it, as a 2-3% win on two of the spec tests seems  
worthwhile.


I'd be interested in finding someone that might be interested in  
tracking down where the benefit comes from in the patch and pushing  
into mainline what goodness there is to be had from the patch.  Any  
takers?  If I can find someone, I'd be happy to send out the version  
of the patch for mainline.  [ hum just 567 lines] On second though,  
I'll just include at the end for reference.  Note, there is one soft  
conflict resolution to resolve in going from the 4.2 context to  
mainline, which I've not yet resolved.


2004-07-13  Zdenek Dvorak  <[EMAIL PROTECTED]>

* Makefile.in (tree-ssa-loop.o, tree-ssa-dce.o): Add function.h
dependency.
* builtins.c (expand_builtin): Handle  
BUILT_IN_MAYBE_INFINITE_LOOP.

* builtins.def (BUILT_IN_MAYBE_INFINITE_LOOP): New builtin.
* function.h (struct function): Add marked_maybe_inf_loops  
field.

* timevar.def (TV_MARK_MILOOPS): New timevar.
* tree-flow.h (mark_maybe_infinite_loops): Declare.
* tree-optimize.c (init_tree_optimization_passes): Add
pass_mark_maybe_inf_loops.
* tree-pass.h (pass_mark_maybe_inf_loops): Declare.
* tree-ssa-dce.c: Include function.h.
(find_obviously_necessary_stmts): Mark back edges only if  
they were

not marked already.
(perform_tree_ssa_dce): Do not call mark_dfs_back_edges here.
* tree-ssa-loop-niter.c (unmark_surely_finite_loop,
mark_maybe_infinite_loops): New functions.
* tree-ssa-loop.c: Include function.h.
(tree_mark_maybe_inf_loops, gate_tree_mark_maybe_inf_loops,
pass_mark_maybe_inf_loops): New pass.
* tree-ssa-operands.c (function_ignores_memory_p): Add
BUILT_IN_MAYBE_INFINITE_LOOP.

Doing diffs in .:
--- ./builtins.c.~1~2007-04-13 10:06:18.0 -0700
+++ ./builtins.c2007-04-21 15:54:01.0 -0700
@@ -6562,6 +6562,12 @@ expand_builtin (tree exp, rtx target, rt
return target;
   break;
 
+/* APPLE LOCAL begin lno */
+case BUILT_IN_MAYBE_INFINITE_LOOP:
+  /* This is just a fake statement that expands to nothing.  */
+  return const0_rtx;
+/* APPLE LOCAL end lno */
+
 case BUILT_IN_FETCH_AND_ADD_1:
 case BUILT_IN_FETCH_AND_ADD_2:
 case BUILT_IN_FETCH_AND_ADD_4:
--- ./builtins.def.~1~  2007-04-13 10:06:19.0 -0700
+++ ./builtins.def  2007-04-21 15:54:01.0 -0700
@@ -639,6 +639,8 @@ DEF_LIB_BUILTIN(BUILT_IN_FREE, "
 DEF_GCC_BUILTIN(BUILT_IN_FROB_RETURN_ADDR, "frob_return_addr", 
BT_FN_PTR_PTR, ATTR_NULL)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_GETTEXT, "gettext", 
BT_FN_STRING_CONST_STRING, ATTR_FORMAT_ARG_1)
 DEF_C99_BUILTIN(BUILT_IN_IMAXABS, "imaxabs", BT_FN_INTMAX_INTMAX, 
ATTR_CONST_NOTHROW_LIST)
+/* APPLE LOCAL lno */
+DEF_GCC_BUILTIN(BUILT_IN_MAYBE_INFINITE_LOOP, "maybe_infinite_loop", 
BT_FN_VOID, ATTR_NULL)
 DEF_GCC_BUILTIN(BUILT_IN_INIT_DWARF_REG_SIZES, 
"init_dwarf_reg_size_table", BT_FN_VOID_PTR, ATTR_NULL)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FINITE, "finite", BT_FN_INT_DOUBLE, 
ATTR_CONST_NOTHROW_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FINITEF, "finitef", BT_FN_INT_FLOAT, 
ATTR_CONST_NOTHROW_LIST)
--- ./cfghooks.c.~1~2007-02-12 20:10:38.0 -0800
+++ ./cfghooks.c2007-04-21 15:59:31.0 -0700
@@ -405,6 +405,10 @@ edge
 split_block (basic_block bb, void *i)
 {
   basic_block new_bb;
+  /* APPLE LOCAL begin lno */
+  bool irr = (bb->flags & BB_IRREDUCIBLE_LOOP) != 0;
+  int flags = EDGE_FALLTHRU;
+  /* APPLE LOCAL end lno */
 
   if (!cfg_hooks->split_block)
 internal_error ("%s does not support split_block", cfg_hooks->name);
@@ -416,6 +420,13 @@ split_block (basic_block bb, void *i)
   new_bb->count = bb->count;
   new_bb->frequency = bb->frequency;
   new_bb->loop_depth = bb->loop_depth;
+  /* APPLE LOCAL begin lno */
+  if (irr)
+{
+  new_bb->flags |= BB_IRREDUCIBLE_LOOP;
+  flags |= EDGE_IRREDUCIBLE_LOOP;
+}
+  /* APPLE LOCAL end lno */
 
   if (dom_info_available_p (CDI_DOMINATORS))
 {
@@ -560,6 +571,15 @@ split_edge (edge e)
}
 }
 
+  /* APPLE LOCAL begin lno */
+  if (irr)
+{
+  ret->flags |= BB_IRREDUCIBLE_LOOP;
+  EDGE_PRED (ret, 0)->flags |= EDGE_IRREDUCIBLE_LOOP;
+  EDGE_SUCC (ret, 0)->flags |= EDGE_IR

Re: maybe_infinite_loop?

2007-04-21 Thread Andrew Pinski

On 4/21/07, Mike Stump <[EMAIL PROTECTED]> wrote:

We still have some lno bits in our tree.  We tried to remove them and
found:
Of all the LNO bits, the last major bits seems to be the below bit.
I don't even know if it is responsible for the benefit we see.  I
thought I'd mention it, as a 2-3% win on two of the spec tests seems
worthwhile.


The only benifit as far as I can tell is causing an extra call at the
tree level which could cause aliasing analysis to go wrong with call
clobbered variables.  The remove empty loop pass in 4.1.0 and above
removes more empty loops than the LNO patch could ever remove.  So
really I think you are just seeing bogus effects of slightly different
aliasing and register pressure.  Nothing to get your hopes up at
anyways.

Thanks,
Andrew Pinski