Re: ISO C prototype style for libiberty?

2005-03-26 Thread Zack Weinberg
DJ Delorie <[EMAIL PROTECTED]> writes:

>> Less to maintain is all I was hoping for.  I think the configure
>> scripts (both libiberty's and gcc's) could be simplified quite a bit
>> if we assumed a C89 compliant runtime library, as could libiberty.h
>> and system.h.
>
> Well, gcc can make assumptions libiberty can't, and as far as
> libiberty's configure goes, "if it ain't broke, don't fix it" seems to
> be the best course.

Fair enough.  Particularly since I do not have current plans to do
anything but speculate.  Hands full with other stuff.

> A target environment, for example, may use libiberty to *provide* c89
> support functions for its runtime library.

Isn't that what newlib is for...?  I should be clear, though; I only
want to make this assumption for the host and build systems.

zw


bootstrap compare failure in ada/targparm.o on i686-pc-linux-gnu?

2005-03-26 Thread Alexandre Oliva
I've been getting bootstrap failures on i686-pc-linux-gnu for the past
few days, with --enable-languages=all,ada, on Fedora Core devel.
There are indeed differences between stage2 and stage3 targparm.o, but
upon visual inspection, they appear to be harmless, possibly some
effect of FP extra precision or so in branch prediction or basic block
ordering (wild guesses).  Here are the differences I get:

diff <(objdump -dr gcc/ada/targparm.o) <(objdump -dr gcc/stage2/ada/targparm.o)
2c2
< gcc/ada/targparm.o: file format elf32-i386
---
> gcc/stage2/ada/targparm.o: file format elf32-i386
659,660c659,660
<  936: 83 bd b8 fd ff ff 02cmpl   $0x2,0xfdb8(%ebp)
<  93d: 0f 84 49 0a 00 00   je 138c 

---
>  936: 83 bd b8 fd ff ff 01cmpl   $0x1,0xfdb8(%ebp)
>  93d: 0f 84 7a 0b 00 00   je 14bd 
> 
670c670
<  973: 0f 84 19 0a 00 00   je 1392 

---
>  973: 0f 84 c7 09 00 00   je 1340 
> 
1027c1027
<  e77: 0f 85 db 05 00 00   jne1458 

---
>  e77: 0f 85 d0 05 00 00   jne144d 
> 
1073c1073
<  f19: 0f 85 26 04 00 00   jne1345 

---
>  f19: 0f 85 e7 04 00 00   jne1406 
> 
1351,1435c1351,1436
< 1340: e9 9a f2 ff ff  jmp5df 

< 1345: 8b 75 08mov0x8(%ebp),%esi
< 1348: 01 de   add%ebx,%esi
< 134a: 89 b5 c4 fd ff ff   mov%esi,0xfdc4(%ebp)
< 1350: 0f b6 c1movzbl %cl,%eax
< 1353: 89 04 24mov%eax,(%esp)
< 1356: e8 fc ff ff ff  call   1357 

<   1357: R_386_PC32output__write_char
< 135b: ff 85 c4 fd ff ff   incl   0xfdc4(%ebp)
< 1361: 8b 95 d8 fd ff ff   mov0xfdd8(%ebp),%edx
< 1367: 8b 85 c4 fd ff ff   mov0xfdc4(%ebp),%eax
< 136d: 42  inc%edx
< 136e: 89 95 d8 fd ff ff   mov%edx,0xfdd8(%ebp)
< 1374: 0f b6 08movzbl (%eax),%ecx
< 1377: 80 f9 0acmp$0xa,%cl
< 137a: 0f 95 c2setne  %dl
< 137d: 80 f9 29cmp$0x29,%cl
< 1380: 0f 95 c0setne  %al
< 1383: 84 d0   test   %dl,%al
< 1385: 75 c9   jne1350 

< 1387: e9 93 fb ff ff  jmpf1f 

< 138c: 8b a5 04 fd ff ff   mov0xfd04(%ebp),%esp
< 1392: e8 fc ff ff ff  call   1393 

<   1393: R_386_PC32output__set_standard_error
< 1397: b8 f4 00 00 00  mov$0xf4,%eax
<   1398: R_386_32  .rodata
< 139c: 89 85 08 fd ff ff   mov%eax,0xfd08(%ebp)
< 13a2: b8 58 00 00 00  mov$0x58,%eax
<   13a3: R_386_32  .rodata
< 13a7: 89 85 0c fd ff ff   mov%eax,0xfd0c(%ebp)
< 13ad: 8b 85 08 fd ff ff   mov0xfd08(%ebp),%eax
< 13b3: 8b 95 0c fd ff ff   mov0xfd0c(%ebp),%edx
< 13b9: 89 04 24mov%eax,(%esp)
< 13bc: 89 54 24 04 mov%edx,0x4(%esp)
< 13c0: e8 fc ff ff ff  call   13c1 

<   13c1: R_386_PC32output__write_line
< 13c5: b8 24 01 00 00  mov$0x124,%eax
<   13c6: R_386_32  .rodata
< 13ca: 89 85 10 fd ff ff   mov%eax,0xfd10(%ebp)
< 13d0: b8 50 00 00 00  mov$0x50,%eax
<   13d1: R_386_32  .rodata
< 13d5: 89 85 14 fd ff ff   mov%eax,0xfd14(%ebp)
< 13db: 8b 95 10 fd ff ff   mov0xfd10(%ebp),%edx
< 13e1: 8b 8d 14 fd ff ff   mov0xfd14(%ebp),%ecx
< 13e7: 89 14 24mov%edx,(%esp)
< 13ea: 89 4c 24 04 mov%ecx,0x4(%esp)
< 13ee: e8 fc ff ff ff  call   13ef 

<   13ef: R_386_PC32output__write_str
< 13f3: 8b 5d 08mov0x8(%ebp),%ebx
< 13f6: 8b b5 d8 fd ff ff   mov0xfdd8(%ebp),%esi
< 13fc: 0f b6 0c 33 movzbl (%ebx,%esi,1),%ecx
< 1400: 80 f9 0acmp$0xa,%cl
< 1403: 0f 95 c2setne  %dl
< 1406: 80 f9 29cmp$0x29,%cl
< 1409: 0f 95 c0setne  %al
< 140c: 84 d0   test   %dl,%al
< 140e: 0f 84 0b fb ff ff   je f1f 

< 1414: 01 f3   add%esi,%ebx
< 1416: 89 9d c8 fd ff ff   mov%ebx,0xfdc8(%ebp)
< 141c: 0f b6 c1movzbl %cl,%eax
< 141f: 89 04 24mov%eax,(%esp)
< 1422: e8 fc ff ff ff  call   1423 

<   1423: R_386_

Re: A plan for eliminating cc0

2005-03-26 Thread Steven Bosscher
On Saturday 26 March 2005 04:11, Ian Lance Taylor wrote:
> I'm also not aware of processors changing as you describe,

Well, ia64 comes to mind.  Take the cmp4.* instructions for example.
They are of the form "(predicate) cmp4.cmpoperator p1,p2 = cmpoperands"
where p1 and p2 are predicate registers that are both set to something
depending on the result of the comparison.

Guessing and hand waving starts here... ;-)
As far as I can tell, there is no architecture requirement that p1 and
p2 must be a register pair (ie. p6,p7 or p2,p3, etc.), but that seems
to be the only form that GCC can produce.  All cmp4 patterns in ia64.md
have a single set to p1, and an assembler output template of the form
"cmp4.* %0, %I0 = %3, %2" where the I means "Invert a predicate register
by adding 1".
Perhaps this was done this way to avoid insn patterns with two sets?

Gr.
Steven


Re: bootstrap compare failure in ada/targparm.o on i686-pc-linux-gnu?

2005-03-26 Thread Graham Stott
Hi Alex,
I do regular bootstraps of mainline all languages on FC3 i686-pc-linuux-gnu  
and haven't seen any
problemss upto Friday. I'm using --enable-checking=tree,misc,rtl,rtlflag which 
might make a
difference.
Cheers
Graham


Re: bootstrap compare failure in ada/targparm.o on i686-pc-linux-gnu?

2005-03-26 Thread Eric Botcazou
> I do regular bootstraps of mainline all languages on FC3
> i686-pc-linuux-gnu  and haven't seen any problemss upto Friday. I'm using
> --enable-checking=tree,misc,rtl,rtlflag which might make a difference.

You should add 'assert' with 4.x, otherwise you miss the simple assertions.

-- 
Eric Botcazou


Re: Heads up: 4.0 libjava failures on powerpc-apple-darwin7.8.0

2005-03-26 Thread Bradley Lucier
On Mar 25, 2005, at 1:22 PM, Tom Tromey wrote:
"Brad" == Bradley Lucier <[EMAIL PROTECTED]> writes:
Brad> http://gcc.gnu.org/ml/gcc-testresults/2005-03/msg01559.html
I didn't see more recent results, but I suspect this problem has been
fixed.
It seems that the libjava tests have been turned off, so it depends on 
the meaning of "fixed":

http://gcc.gnu.org/ml/gcc-testresults/2005-03/msg01749.html
I'm sorry, I don't understand what's going on, but it doesn't look good.
Brad


Re: building GCC 4.0 for arm-elf target on mingw host

2005-03-26 Thread E. Weddington
Dave Murphy wrote:
After 3 or 4 restarts it finally appears to proceed normally until 
building libgcc

make[3]: Leaving directory 
`/c/projects/devkitPro/sources/arm-elf/gcc/gcc'
/c/projects/devkitPro/sources/arm-elf/gcc/gcc/xgcc 
-B/c/projects/devkitPro/sources/arm-elf/gcc/gcc/ 
-Bc:/devkitARM_r12/arm-elf/bin/ -Bc:/devkitARM_r12/arm-elf/lib/ 
-isystem c:/devkitARM_r12/arm-elf/include -isystem 
c:/devkitARM_r12/arm-elf/sys-include -O2  -DIN_GCC -DCROSS_COMPILE   
-W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition  -isystem ./include  -Dinhibit_libc -fno-inline 
-g  -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc -I. -I 
-I../../../gcc-4.0-20050319-new/gcc 
-I../../../gcc-4.0-20050319-new/gcc/ 
-I../../../gcc-4.0-20050319-new/gcc/../include 
-I../../../gcc-4.0-20050319-new/gcc/../libcpp/include  -DL_muldi3 -c 
../../../gcc-4.0-20050319-new/gcc/libgcc2.c -o libgcc/./_muldi3.o
In file included from ../../../gcc-4.0-20050319-new/gcc/libgcc2.c:43:
./tm.h:5:28: error: config/dbxelf.h: No such file or directory
./tm.h:6:27: error: config/elfos.h: No such file or directory
./tm.h:7:37: error: config/arm/unknown-elf.h: No such file or directory
./tm.h:8:29: error: config/arm/elf.h: No such file or directory
./tm.h:9:30: error: config/arm/aout.h: No such file or directory
./tm.h:10:29: error: config/arm/arm.h: No such file or directory
./tm.h:11:23: error: defaults.h: No such file or directory
In file included from ../../../gcc-4.0-20050319-new/gcc/libgcc2.c:56:
../../../gcc-4.0-20050319-new/gcc/libgcc2.h:230:3: error: #error 
"expand the table"
../../../gcc-4.0-20050319-new/gcc/libgcc2.c: In function '__mulhi3':
../../../gcc-4.0-20050319-new/gcc/libgcc2.c:527: error: 
'BITS_PER_UNIT' undeclared (first use in this function)
../../../gcc-4.0-20050319-new/gcc/libgcc2.c:527: error: (Each 
undeclared identifier is reported only once
../../../gcc-4.0-20050319-new/gcc/libgcc2.c:527: error: for each 
function it appears in.)
make[2]: *** [libgcc/./_muldi3.o] Error 1
make[2]: Leaving directory 
`/c/projects/devkitPro/sources/arm-elf/gcc/gcc'
make[1]: *** [stmp-multilib] Error 2
make[1]: Leaving directory 
`/c/projects/devkitPro/sources/arm-elf/gcc/gcc'
make: *** [all-gcc] Error 2

copying the compile line and removing the spurious -I and the 
-I../../../gcc-4.0-20050319-new/gcc/ results in no errors.

I'm having a little trouble finding where this line is built up in the 
makefiles, can anyone point me in the right direction to solve this 
problem?

Interesting.
I just got a similar error with building an avr cross in  the latest 
MinGW/MSYS for gcc 3.4.3. Reported here:


Now I'm wondering whether it's a gcc bug or if it's an MSYS bug. I can 
successfully build gcc for the avr target using cygwin with -mno-cygwin 
and explicitly setting the build and host.

Eric


Re: GCC3 to GCC4 performance regression. Bug?

2005-03-26 Thread Steven Bosscher
On Friday 25 March 2005 02:09, James E Wilson wrote:
> I tried it, it doesn't help.  It solves neither the loop invariant code
> motion problem nor the do-loop optimization problem.

As pointed out by Andrew Pinski, the do-loop transformation was in fact
not valid.  The rest of the slowdown looks like an RTL alias analysis
problem that, as far as I can tell, loop.c could not help fixing.  If
you make the .vars dump from GCC4 compilable, and you build that with
both GCC3 and GCC4, GCC4 (of course) performs as well (bad) as for the
original test case, and the GCC3 binary slows down to the same speed as
the GCC4 one (runtime of 16.4s with GCC3 vs. 16.7 for GCC4).  Usually,
this means IVopts did something that confuses the RTL alias analysis,
which is already just poor for ia64 anyway.

Gr.
Steven



Merging CCP and VRP?

2005-03-26 Thread Kazu Hirata
Hi Diego,

Have you considered merging CCP and VRP (as suggested by Kenny last
year at the summit)?

Originally I was thinking that ASSERT_EXPRs, or ranges gathered by VRP
rather, were very useful for jump threading, but they seem to be
useful for constant propagation, too.  Consider

void bar (int);

void
foo (int a)
{
  if (a == 0)
bar (a);
}

At the end of VRP we still have

:
  if (a_1 == 0) goto ; else goto ;

:;
  a_2 = 0;
  bar (a_2);  <-- a_2 isn't replaced with 0 yet!

Note that we don't have bar (0) at the end.  This is because currently
VRP does not have the equivalent of substitute_and_fold.  We just use
the range information to fold COND_EXPR; we don't fold each statement
using constants and ranges gathered by VRP.

We could have VRP call substitute_and_fold, but then once we do that,
the distinction between CCP and VRP would become less clear.  Plus,
propagating conditional equivalences may have a chain effect on
simplification, especially with inlining.  That is, an equivalence
like "a = 0" above may end up massively simplifying the code.  For the
obvious reason, we don't want to do

  do
{
  Run CCP;
  Insert ASSERT_EXPRs.
  Run VRP;
  Remove ASSERT_EXPRs.
}
  while (something changes)

So I am thinking about inserting ASSERT_EXPRs up front *before* going
into SSA, without much of pruning, and then run an enhanced version of
CCP, which includes ranges and anti-ranges in the lattice, which are
all suggested by Kenny last year.  I'm thinking about keeping
ASSERT_EXPRs until it's difficult to keep them.  I don't know much
about loop optimizers, so if I were to write code to keep
ASSERT_EXPRs, I might give up by turning ASSERT_EXPRs into copy
statements just before hitting loop optimizers.  :-) I have not
figured out how to deal with ASSERT_EXPRs in FRE, but Daniel Berlin
says I just have to teach tree-vn.c how to deal with it.

One note about the order of optimizations.  I think it's a good idea
to run an enhanced version of CCP before copy-prop because the
propagation engine can deal with presence of copies very well, whether
copy statements or PHI nodes.  If we run copy prop after an enhanced
version of CCP, we would still have useful information in
SSA_NAME_VALUE_RANGE at the end.  Copy prop only kills newer copies;
it doesn't even touch SSA_NAME_VALUE_RANGE stored in older copies.

Last but not least, I'm willing to do the work, but I'd like to be
more or less on the same page with other people hacking these scalar
optimizations before I do any significant work.

Thoughts?

p.s.
By the way, looking at Kenny's slides from last year, one thing we
have not implemented in our propagation engine is to process the CFG
worklist in the DFS order of a dominator tree.  I haven't looked
closely at this, but if the speed of propagation is a concern, we
should come back to this.

Kazu Hirata


Re: [rtl-optimization] Improve Data Prefetch for IA-64

2005-03-26 Thread Richard Guenther
On Sat, 26 Mar 2005 02:17:58 +0100, Steven Bosscher <[EMAIL PROTECTED]> wrote:
> On Saturday 26 March 2005 02:22, Canqun Yang wrote:
> > * loop.c (PREFETCH_BLOCKS_BEFORE_LOOP_MAX): Defined conditionally.
> > (scan_loop): Change extra_size from 16 to 128.
> > (emit_prefetch_instructions): Don't ignore all prefetches within
> > loop.
> 
> OK, so I know this is not a popular subject, but can we *please* stop
> working on loop.c and focus on getting the new RTL and tree loop passes
> to do what we want?  All this loop.c patching is a typical example of
> why free software development does not always work: always going for
> the low-hanging fruit.  In this case, there have been several attempts
> to replace the prefetching stuff in loop.c with something better.  On
> the rtl-opt branch there is a new RTL loop-prefetch.c, and on the LNO
> branch there is a re-use analysis based prefetching pass.  Why don't
> you try to finish and improve those passes, instead of making it yet
> again harder to remove loop.c.  This one file is a *huge* problem for
> just about the entire RTL optimizer path.  It is, for example, the
> reason why there is no profile information available before this old
> piece of, if I may say, junk runs, and it the only reason why a great
> many functions in for example jump.c and the various cfg*.c files can
> still not be removed.

Why can't we just kill this beast on HEAD _now_ and this way force
people who experience regressions work on the new loop optimizer?
We're waiting for that happening since 3.4 now...

Richard.


gcc-4.0-20050326 is now available

2005-03-26 Thread gccadmin
Snapshot gcc-4.0-20050326 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.0-20050326/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.0 CVS branch
with the following options: -rgcc-ss-4_0-20050326 

You'll find:

gcc-4.0-20050326.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.0-20050326.tar.bz2 C front end and core compiler

gcc-ada-4.0-20050326.tar.bz2  Ada front end and runtime

gcc-fortran-4.0-20050326.tar.bz2  Fortran front end and runtime

gcc-g++-4.0-20050326.tar.bz2  C++ front end and runtime

gcc-java-4.0-20050326.tar.bz2 Java front end and runtime

gcc-objc-4.0-20050326.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.0-20050326.tar.bz2The GCC testsuite

Diffs from 4.0-20050319 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.0
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: building GCC 4.0 for arm-elf target on mingw host

2005-03-26 Thread Dave Murphy
E. Weddington wrote:
Dave Murphy wrote:
copying the compile line and removing the spurious -I and the 
-I../../../gcc-4.0-20050319-new/gcc/ results in no errors.

I'm having a little trouble finding where this line is built up in 
the makefiles, can anyone point me in the right direction to solve 
this problem?

Interesting.
I just got a similar error with building an avr cross in  the latest 
MinGW/MSYS for gcc 3.4.3. Reported here:


Now I'm wondering whether it's a gcc bug or if it's an MSYS bug. I can 
successfully build gcc for the avr target using cygwin with 
-mno-cygwin and explicitly setting the build and host.

Danny pointed me to a patch on the mingw mailing list which fixes it for 
me on win2kpro

http://sourceforge.net/tracker/?func=detail&atid=102435&aid=1053052&group_id=2435
Dave


Re: [rtl-optimization] Improve Data Prefetch for IA-64

2005-03-26 Thread Canqun Yang
The last ChangeLog of rtlopt-branch was written in 
2003. After more than one year, many impovements in 
this branch haven't been put into the GCC HEAD. Why? 

ÒýÑÔ Steven Bosscher <[EMAIL PROTECTED]>:

> On Saturday 26 March 2005 02:22, Canqun Yang wrote:
> >         * loop.c (PREFETCH_BLOCKS>
> _BEFORE_LOOP_MAX): Defined conditionally.
> >         (scan_loop): Change extra>
> _size from 16 to 128.
> >         (emit_prefetch_instructio>
> ns): Don't ignore all prefetches within
> > loop.
>
> OK, so I know this is not a popular subject, but can 
we *please* stop
> working on loop.c and focus on getting the new RTL 
and tree loop passes
> to do what we want?  All this loop.c patching is a 
typical example of
> why free software development does not always work: 
always going for
> the low-hanging fruit.  In this case, there have 
been several attempts
> to replace the prefetching stuff in loop.c with 
something better.  On
> the rtl-opt branch there is a new RTL loop-
prefetch.c, and on the LNO
> branch there is a re-use analysis based prefetching 
pass.  Why don't
> you try to finish and improve those passes, instead 
of making it yet
> again harder to remove loop.c.  This one file is a 
*huge* problem for
> just about the entire RTL optimizer path.  It is, 
for example, the
> reason why there is no profile information available 
before this old
> piece of, if I may say, junk runs, and it the only 
reason why a great
> many functions in for example jump.c and the various 
cfg*.c files can
> still not be removed.
>
> Gr.
> Steven
>
> 



Canqun Yang
Creative Compiler Research Group.
National University of Defense Technology, China.


Re: [rtl-optimization] Improve Data Prefetch for IA-64

2005-03-26 Thread Steven Bosscher
On Sunday 27 March 2005 03:53, Canqun Yang wrote:
> The last ChangeLog of rtlopt-branch was written in
> 2003. After more than one year, many impovements in
> this branch haven't been put into the GCC HEAD. Why?

Almost all of the rtlopt branch was merged.  Prefetching is one
of the few things that was not, probably Zdenek knows why.

Gr.
Steven


Re: [rtl-optimization] Improve Data Prefetch for IA-64

2005-03-26 Thread Canqun Yang
ÒýÑÔ Steven Bosscher <[EMAIL PROTECTED]>:

> On Sunday 27 March 2005 03:53, Canqun Yang wrote:
> > The last ChangeLog of rtlopt-branch was written in
> > 2003. After more than one year, many impovements in
> > this branch haven't been put into the GCC HEAD. 
Why?
>
> Almost all of the rtlopt branch was merged.  
Prefetching is one
> of the few things that was not, probably Zdenek 
knows why.
>

Another question is why the new RTL loop-unroller does 
not support giv splitting. It is very usefull 
according to my tests for the old one. Is there anyone 
plan to implement it? The writter of the new loop-
unroller or someone who is familiar with that part. 
They will carry out it better and faster, I think.

> Gr.
> Steven
> 


Canqun Yang
Creative Compiler Research Group.
National University of Defense Technology, China.


Re: uclibc patches

2005-03-26 Thread Simon Richter
Hi,
Peter S. Mazinger schrieb:
I'm not sure *-*-linux-uclibc would be the right choice, as it suggests
running Linux with uclibc as your C library, which is something the
binutils need not care about. I could, however, see a case for
*-*-uclinux due to the ABI differences and the need for relocation
information in fully linked executables.

Maybe for binutils it does not make sense, but for gcc (libstdc++)
it will be needed.
I think it may be much better to teach gcc how to switch its standard
libraries easily, possibly removing the fixincludes step from the
all-host target as a working gcc without system includes may be required
to bring the C library into a state where it can install its headers.
   Simon


signature.asc
Description: OpenPGP digital signature


Re: A plan for eliminating cc0 - Questions concerning the AVR target

2005-03-26 Thread Ian Lance Taylor
BjÃrn Haase <[EMAIL PROTECTED]> writes:

> Imagine, you are having a clean md with a consistent "double set" 
> representation for all the patterns that actually alter the condition code. I 
> understood, that the problem for the optimization passes (e.g. combine) then 
> shows up only for instructions for which the second "set" actually happens to 
> be used. All of the other optimizations would not be detoriated by the second 
> "set".  IIUC combine would then also be able to simplify
> 
> (substract with cc set)  -- double set, but actually only a single set is used
> (compare with cc set)   -- single set, now problem)
> (read cc and branch)
> 
> to 
> 
> (substract with cc set)  
> (read cc and branch)
> 
> ? If this is true, I would not expect a serious deterioation of the 
> optimization, at least not for the AVR target.

I don't see how combine could perform that particular optimization.
It would try to substitute the subtract into the compare, and fail to
recognize the result.  Combine would not try to general parallel sets
and recognize that.  The CSE pass, on the other hand, might be able to
figure this out.  I'm not sure.

Note that combine was just an example of an optimization pass which
does not deal well with multiple sets.  Grep for single_set in the
compiler sources.

> Is there a way that makes it possible that only reload uses the patterns that 
> save and restore the condition code while everywhere else the usual patterns 
> are generated? In case of the AVR target, most of the time reload will be 
> able to use processor instructions that do not require the save/restore 
> operations (it could freely access addresses up to a constant offset of 64 
> from the stack pointer) and the costly operations would not show up very 
> frequently (only for large offsets).  

You could do it by writing your insn patterns as expanders which check
reload_in_progress || reload_completed and under those circumstances
emit insns which do not clobber the condition codes.

Ian


Re: Merging CCP and VRP?

2005-03-26 Thread Diego Novillo
On Sat, Mar 26, 2005 at 12:00:43PM -0500, Kazu Hirata wrote:

> Have you considered merging CCP and VRP (as suggested by Kenny last
> year at the summit)?
> 
By merging, do you mean *replacing* CCP with VRP?  Yes, it's
doable.  No, it's not a good idea.

Because of its lattice evaluation, VRP is about 3x slower than
CCP.  Consider that we currently schedule a single VRP pass and 3
CCP passes.  On cc1-i-files, that single VRP accounts for ~4
seconds of compile time, the three CCP passes account for ~5
seconds of compile time.

Also consider that currently our VRP pass is pretty quick because
it does not propagate branch probabilities.  It only evaluates
probabilities 0 and 1.  I have plans to change it so that it can
feed information to branch prediction.  That will make it even
more heavyweight.

Furthermore, VRP only deals with GIMPLE registers.  Our CCP pass
can propagate load/store constants.  If we burdened VRP with
loads and stores, it would be even slower.

So, while VRP can subsume some of the actions of CCP, it's much
slower and can't really be run all that often.  It's fine to
allow it to do some constant propagation.  But morphing the two
passes into one will not gain us much.

> :
>   if (a_1 == 0) goto ; else goto ;
> 
> :;
>   a_2 = 0;
>   bar (a_2);  <-- a_2 isn't replaced with 0 yet!
> 
> Note that we don't have bar (0) at the end.  This is because currently
> VRP does not have the equivalent of substitute_and_fold.  We just use
> the range information to fold COND_EXPR; we don't fold each statement
> using constants and ranges gathered by VRP.
> 
Right.  And that is something that is easily doable in
vrp_finalize.  It's in my todo pile, but if you want to do it, go
right ahead.


> So I am thinking about inserting ASSERT_EXPRs up front *before* going
> into SSA, without much of pruning, and then run an enhanced version of
>
I was doing this originally.  It turns out to be easier and
faster to insert the assertions once we are in SSA form.  If you
go back in time in the VRP code, you'll see that it evolved from
there.

What we could do is insert the assertions, do the various passes
that take advantage of them and then remove the assertions once
they are no longer necessary.  I still haven't read in detail
your plan for using ASSERT_EXPRs in the jump threader, but at
first sight it sounded decent.

> statements just before hitting loop optimizers.  :-) I have not
> figured out how to deal with ASSERT_EXPRs in FRE, but Daniel Berlin
> says I just have to teach tree-vn.c how to deal with it.
> 
At one point, all the passes had to deal with ASSERT_EXPRs.
Mostly by ignoring them.  Which is additional, unwanted work
because some of them had to actively know about them being
nothing but fancy copy operations.  That gets in the way of their
work.  I think that ASSERT_EXPRs should only survive as long as
they're useful.

> Last but not least, I'm willing to do the work, but I'd like to be
> more or less on the same page with other people hacking these scalar
> optimizations before I do any significant work.
> 
Sure.  Go ahead.  My short term plan is to start merging the
various components into mainline.  I'll start with the
incremental SSA patches, followed by VRP, the CCP and copy-prop
improvements.  Perhaps we could leave the changes to the threader
in TCB for a little while longer, but first I need to read your
proposal in detail.

> By the way, looking at Kenny's slides from last year, one thing we
> have not implemented in our propagation engine is to process the CFG
> worklist in the DFS order of a dominator tree.  I haven't looked
> closely at this, but if the speed of propagation is a concern, we
> should come back to this.
> 
ISTR either stevenb or dberlin implementing a dom-order
propagation.  I think they got a minor speedup that could be
worth having.


Diego.


Re: A plan for eliminating cc0

2005-03-26 Thread Paul Schlie
> From: Ian Lance Taylor 
> I'm also not aware of processors changing as you describe, except for
> the particular special case of SIMD vector instructions.  gcc can and
> does represent vector instructions as a single set.

- Understood, unfortunately hiding the multiple-set nature of instructions
  which simultaneously set data and condition-state register values in an
  abstract new data mode type like simd instructions do won't likely be
  helpful, as unlike multiple discrete simd values embedded in a vector
  data type, data and condition values tend to have different subsequent
  dependant evaluation paths.

> Yes, but the point of representing changes to the condition flags is
> precisely to permit optimizations when the condition flags are used,
> and that is precisely when the single_set assumption will fail.  You
> are correct that in general adding descriptions of the condition code
> changes to the RTL won't inhibit optimizations that don't meaningfully
> set the condition code flags.  But it will inhibit optimizations which
> do set the condition code flags, and that more or less obviates the
> whole point of representing the condition code setting in the first
> place.

- OK, I think I understand the difference in our perspective on the issue.

  In general it seems that much of the concern related to the optimization
  complexity relating to multi-set instructions is related to attempting
  to iteratively optimize instruction mappings?

  I presume that "code" can/should be optimally generated once by initially
  optimally covering the rtl representing a basic block (with minimal cost
  in either storage, cycles or some hybrid of both); where there's then no
  need to ever subsequently screw with it again (although various basic
  block partitioning resulting from various loop transformations strategies,
  etc. may require multiple mappings to determine their relative costs).
  
  Where this presumption basically ideally requires that the target be
  described as accurately as possible entirely in rtl, with no reliance
  on procedural or peephole optimization, relying entirely on GCC to
  optimally cover the program's basic-block rtl optimally with rtl
  instruction description equivalents; thereby by fully exposing all
  dependencies, an optimal instruction schedule will naturally result
  from an optimal rtl graph covering without needing to perform an
  explicit further optimization for example.
 
  (is this not feasible if the target is accurately described in rtl?)