Re: CC0 to CCmode conversion

2005-03-19 Thread Björn Haase

Hi Denis,

I have had a look at your patch. It generally seems to work. Presently it 
still misses some optimization where the old cc0 back-end was smarter. The 
testsuite also reports a couple of new failures.

Yours,

Björn.

Here are a couple of cases where the old cc0 interface generated better code:

--- ./cc0/test.s2005-03-09 21:45:49.0 +0100
+++ ./CC/test.s 2005-03-18 22:40:20.0 +0100
@@ -96,19 +96,21 @@

-   cpi r17,lo8(31)
+   cpi r17,lo8(30)
+   breq .+2
brsh .L14


-   cpi r17,lo8(30)
+   cpi r17,lo8(29)
+   breq .L18
brlo .L18


@@ -405,7 +415,7 @@
 /* prologue end (size=0) */
call Hardware_SetRedLED
lds r18,g_ucBuzzerOff
-   tst r18
+   cpi r18,lo8(0)
brne .L82
ldi r24,lo8(1000)
ldi r25,hi8(1000)


ldi r17,hi8(25)
rjmp .L151
 .L147:
-   cp __zero_reg__,r22
-   cpc __zero_reg__,r23
-   brlt .L151
+   cp r22,__zero_reg__
+   cpc r23,__zero_reg__
+   breq .+2
+   brpl .L151



Results from the testsuite run:

Tests that now fail, but worked before:

gcc.c-torture/compile/20031220-2.c  -O1  (test for excess errors)
gcc.c-torture/compile/20031220-2.c  -O2  (test for excess errors)
gcc.c-torture/compile/20031220-2.c  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  (test for excess errors)
gcc.c-torture/compile/20031220-2.c  -O3 -fomit-frame-pointer -funroll-loops
(test for excess errors)
gcc.c-torture/compile/20031220-2.c  -O3 -fomit-frame-pointer  (test for
 excess errors)
gcc.c-torture/compile/20031220-2.c  -O3 -g  (test for excess errors)
gcc.c-torture/compile/20031220-2.c  -Os  (test for excess errors)
gcc.c-torture/execute/20030606-1.c execution,  -O1
gcc.c-torture/execute/20030606-1.c execution,  -O2
gcc.c-torture/execute/20030606-1.c execution,  -O3 -fomit-frame-pointer
gcc.c-torture/execute/20030606-1.c execution,  -O3 -g
gcc.c-torture/execute/20030606-1.c execution,  -Os
gcc.c-torture/execute/931004-6.c execution,  -Os
gcc.c-torture/execute/builtins/memcmp.c execution,  -O1
gcc.c-torture/execute/builtins/memcmp.c execution,  -O2
gcc.c-torture/execute/builtins/memcmp.c execution,  -O3 -fomit-frame-pointer
gcc.c-torture/execute/builtins/memcmp.c execution,  -O3 -g
gcc.c-torture/execute/builtins/memcmp.c execution,  -Os
gcc.c-torture/execute/compare-1.c execution,  -O1
gcc.c-torture/execute/compare-1.c execution,  -O2
gcc.c-torture/execute/compare-1.c execution,  -Os
gcc.c-torture/execute/divcmp-1.c execution,  -O1
gcc.c-torture/execute/divcmp-1.c execution,  -O2
gcc.c-torture/execute/divcmp-4.c execution,  -O1
gcc.c-torture/execute/divcmp-4.c execution,  -O2
gcc.c-torture/execute/int-compare.c execution,  -O1
gcc.c-torture/execute/int-compare.c execution,  -O2

New tests that FAIL:

gcc.dg/tree-ssa/ssa-ccp-3.c scan-tree-dump-times link_error 0

New tests that PASS:

gcc.dg/tree-ssa/20030731-2.c scan-tree-dump-times if  1
gcc.dg/tree-ssa/20030917-1.c scan-tree-dump-times foo .defval 1
gcc.dg/tree-ssa/20030917-3.c scan-tree-dump-times printf.*, 0 1
gcc.dg/tree-ssa/20030922-2.c scan-tree-dump-times if  2
gcc.dg/tree-ssa/20031015-1.c scan-tree-dump-times V_MAY_DEF 2
gcc.dg/tree-ssa/20031022-1.c scan-tree-dump-times entry_exit_blocks.1..pred 1
gcc.dg/tree-ssa/20040210-1.c scan-tree-dump-times if  2
gcc.dg/tree-ssa/20040216-1.c scan-tree-dump-times Deleted dead store 2
gcc.dg/tree-ssa/20040305-1.c scan-tree-dump-times if  2
gcc.dg/tree-ssa/20040305-1.c scan-tree-dump-times Replaced 1
gcc.dg/tree-ssa/20040513-1.c scan-tree-dump-times \(_Bool\) 0
gcc.dg/tree-ssa/20040513-2.c scan-tree-dump-times link_error 0
gcc.dg/tree-ssa/20040514-1.c scan-tree-dump-times if  0
gcc.dg/tree-ssa/20040518-1.c scan-tree-dump-times if  0
gcc.dg/tree-ssa/20040518-2.c scan-tree-dump-times ABS_EXPR 1
gcc.dg/tree-ssa/20040518-2.c scan-tree-dump-times straightline 1
gcc.dg/tree-ssa/20040615-1.c scan-tree-dump-times bar2 0
gcc.dg/tree-ssa/20040624-1.c scan-tree-dump-times if  1
gcc.dg/tree-ssa/20040703-1.c scan-tree-dump-times 0\.0 0
gcc.dg/tree-ssa/20040721-1.c scan-tree-dump-times = G; 0
gcc.dg/tree-ssa/20040729-1.c scan-tree-dump-times &x 0
gcc.dg/tree-ssa/20040911-1.c scan-tree-dump-not VUSE 

Re: AVR: CC0 to CCmode conversion

2005-03-19 Thread Björn Haase
Hi Paul,

I have the impression that you are trying to open open doors :-) : If IIUC 
what Denis aims to do is to segment the re-organization of the back-end into 
several independent small steps. One step will be the cc0 -> CC_mode issue he 
is addressing now. The splitting issue would be one of the following steps. 
One will have to verify this point, but it seems that only the splitting 
issue requires accurate tracking of all the clobbers/settings of the 
condition code.

In my opinion segmenting the rework of the back-end would indeed be the best 
approach, also because I expect that the instruction patterns *with* 
splitting will be fairly different. E.g. I do not think that the "addsi3" 
will be present any more. So it would be probably a lot of useless work to 
add all of the clobbers for instruction patterns that are likely to vanish in 
the near future.

Yours,

Björn


Re: [AVR] RTL prologue/epilogue

2005-03-20 Thread Björn Haase
Hello Andy,

I have tested your patch concerning RTL prologue/epilogue. Gratulations: My 
testsuite run only reports a single regression

Tests that now fail, but worked before:

gcc.c-torture/execute/20010122-1.c execution,  -O0 

. This happens on a testcase that anyway is problematic (succeeds only for 
selected set of optimization switches).

Due to your "rjmp" trick the resulting code also is sligtly tighter. IIUC, now 
there is also hope for optimizing sequences like

call other_function
ret

to 
jmp other_function
.?

Yours,

Björn


Bootstrap fails on HEAD 4.1 for AVR

2005-04-03 Thread Björn Haase
Hi,

when checking out the gcc tree this morning for a clean rebuild and regular 
testsuite run, I observed that bootstrap failed. It seems that it is related 
to some preprocessor issue:

1.) Problem occures when assembling the libgcc library. First failing 
operation is

/home/bmh/gnucvs/head/build/./gcc/xgcc -B/home/bmh/gnucvs/head/build/./gcc/ 
-B/usr/local/avr/bin/ -B/usr/local/avr/lib/ -isystem /usr/local/avr/include 
-isystem /usr/local/avr/sys-include -O2  -DIN_GCC -DCROSS_COMPILE   -W -Wall 
-Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition  -isystem ./include  -DDF=SF -Dinhibit_libc 
-mcall-prologues -g  -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc -I. 
-I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include 
-I../../gcc/gcc/../libcpp/include  -DL_mulqi3 -xassembler-with-cpp 
-c ../../gcc/gcc/config/avr/libgcc.S -c libgcc/./_mulqi3.o

and the error message reads

../../gcc/gcc/config/avr/libgcc.S: Assembler messages:
../../gcc/gcc/config/avr/libgcc.S:72: Error: suffix or operands invalid for 
`clr'
../../gcc/gcc/config/avr/libgcc.S:72: Error: no such instruction: `clear 
result'
../../gcc/gcc/config/avr/libgcc.S:74: Error: no such instruction: `sbrc r24,0'
../../gcc/gcc/config/avr/libgcc.S:75: Error: too many memory references for 
`add'
../../gcc/gcc/config/avr/libgcc.S:76: Error: too many memory references for 
`add'
../../gcc/gcc/config/avr/libgcc.S:76: Error: no such instruction: `shift 
multiplicand'
../../gcc/gcc/config/avr/libgcc.S:77: Error: no such instruction: `breq 
__mulqi3_exit'
../../gcc/gcc/config/avr/libgcc.S:77: Error: no such instruction: `while 
multiplicand!=0'
../../gcc/gcc/config/avr/libgcc.S:78: Error: no such instruction: `lsr r24'
../../gcc/gcc/config/avr/libgcc.S:79: Error: no such instruction: `brne 
__mulqi3_loop'
../../gcc/gcc/config/avr/libgcc.S:79: Error: no such instruction: `exit if 
multiplier=0'
../../gcc/gcc/config/avr/libgcc.S:81: Error: too many memory references for 
`mov'
../../gcc/gcc/config/avr/libgcc.S:81: Error: no such instruction: `result to 
return register'
make[2]: *** [libgcc/./_mulqi3.o] Fehler 1
make[1]: *** [stmp-multilib] Fehler 2
make: *** [all-gcc] Fehler 2

2.) My impression is that the problem is possibly related to some preprocessor 
issue because when executing

/home/bmh/gnucvs/head/build/./gcc/xgcc -B/home/bmh/gnucvs/head/build/./gcc/ 
-B/usr/local/avr/bin/ -B/usr/local/avr/lib/ -isystem /usr/local/avr/include 
-isystem /usr/local/avr/sys-include -O2  -DIN_GCC -DCROSS_COMPILE   -W -Wall 
-Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition  -isystem ./include  -DDF=SF -Dinhibit_libc 
-mcall-prologues -g  -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc -I. 
-I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include 
-I../../gcc/gcc/../libcpp/include  -DL_mulqi3 -xassembler-with-cpp 
-c ../../gcc/gcc/config/avr/libgcc.S -c libgcc/./_mulqi3.s

in order to have a look at the assembly output, no file _mulqi3.s is 
generated. Instead the output of the preprocessor is written to stdout.

Yours,

Björn


Sorry for the noise: Bootstrap fails on HEAD 4.1 for AVR

2005-04-03 Thread Björn Haase
When trying to figure out the origin of the problem, I have realized so far, 
that it is obviously stems from a problem during my local configure process: 
The xgcc I'm just building tries to pipe the asm result through my "host-as" 
instead of the "target-as". I will myself have to look for why configure 
chose the wrong assembler. Unfortunately, the error message I got from "make" 
was not really instructive. So: Sorry for the noise.

Yours,

Björn


Re: Sorry for the noise: Bootstrap fails on HEAD 4.1 for AVR

2005-04-03 Thread Björn Haase
Am Sonntag, 3. April 2005 17:24 schrieb Peter Barada:
> >When trying to figure out the origin of the problem, I have realized so
> > far, that it is obviously stems from a problem during my local configure
> > process: The xgcc I'm just building tries to pipe the asm result through
> > my "host-as" instead of the "target-as". I will myself have to look for
> > why configure chose the wrong assembler. Unfortunately, the error message
> > I got from "make" was not really instructive. So: Sorry for the noise.
>
> When you configured the cross compiler, did you have the target
> assembler in your PATH?  If not configure will use 'as' in your path
> and find your host assembler instead.
Actually, it seems that it is *not* sufficient to have the excecutables of the 
binutils in your search path. I just had moved the binaries in /usr/local/bin 
to some other location within the search path. Configure, however, did not 
find them and switched to the host-as without any complaint or error 
message :-(. When forcing configure by --prefix=/MyDirectoryWithTheBinaries/ 
to use another path, everything works fine.

Thank's
BjÃrn


Re: Obsoleting c4x last minute for 4.0

2005-04-07 Thread Björn Haase

References: <[EMAIL PROTECTED]> 
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]> 
<[EMAIL PROTECTED]> 
<[EMAIL PROTECTED]> 
<[EMAIL PROTECTED]>

Joseph Myers wrote: 
>One possible way of assessing activity would be to say that after 4.1 
maintained CPU ports should have test results for mainline regularly sent to 
gcc-testresults and monitored for regressions, though this rather depends on 
the willingness of maintainers of embedded ports to do this testing; ports 
without such testing and regression monitoring could be considered at risk.
 
> Only the following ports seem to have had results for 4.1.0-mainline (i.e. 
mainline since 4.0 branched) sent to gcc-testresults: alpha, arm, hppa, 
i?86/x86_64, ia64, mips, powerpc, s390, sh, sparc, although cris and mmix are 
evidently monitored for regressions even though they don't get test results 
to gcc-testresults.
 
 Eric Weddington wrote
>Add the AVR to the list of ports that (so far) haven't had test results sent 
to gcc-testresults. It's only been recently that the GCC test suite has been 
able to run for the AVR using an outside simulator. Hopefully in the future 
this will change; there's a lot of work being done on the AVR port.
 
There have been at least two testsuite reports for the AVR family on 
gcc-testresults by me and head 4.1 is regularly tested by me with the 
testsuite (about once a week). Excecuting tests are realized with the 
simulavr simulator. The procedure of running the test with simulavr is not of 
the turn-key type but not too complicated either. 

The reason why I have stopped posting the test results is that we are 
currently having 481 failures for the AVR target and the existing real bugs 
are completely hidden behind the huge number of failures due to issues like 
"test needs trampolines but does not communicate it" or "test case assumes 
int to be 32 bit". 
IMHO regularly posting the same huge bug list is was not useful at all unless 
one could distinguish between *real* and *pseudo* failures.

I had started to adapt the testsuite by adding functionality for communicating 
that a test case asssumes int to be 32 bit and by means to switch of all 
tests that require trampolines. 

Unfortunately, I did not get any response to the patch I had posted to 
gcc-patches a couple of months ago implementing additional effective target 
keywords :-(. A useful reworking of dozens of the affected test cases 
requires that new effective targets are present and that their names are 
agreed upon. Since I did not get any response on it, I did refrain to 
continue to work on testsuite adaptions so far.

Yours,

Björn


Re: Obsoleting c4x last minute for 4.0

2005-04-07 Thread Björn Haase
Am Freitag, 8. April 2005 01:06 schrieb Janis Johnson:
>
> I should have done that, I must have missed seeing your patch.  I'll look
> for it now in the archives.
>
> Janis
I just had a look at the archives and found that the subject of the mail I 
have been sending was not very clear either :-) (and also it was only one 
month ago): Here is the reference:

http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00512.html

Yours,

Björn


Re: The subreg question

2005-04-17 Thread Björn Haase
Hi,

I have been working on very similar issues for the avr target. You might have 
a look at the patch I have posted today and the corresponding discussion 
thread at the gcc-patches list. 

I have also observed, that gen_highpart and gen_lowpart sometimes causes an 
ICE for reasons. ... did not figure out so far why. I'd like to suggest that 
a pattern that might help you is simply

(set (subreg:HI (match_dup 0) 0) (match_dup 2)
(set (subreg:HI (match_dup 0) 2) (match_dup 3)

with the preparation statements

int full_value = INTVAL( operands[1]);
int lsw_value = full_value & 0x;
if (lsw_value & 0x8000)
  lsw_value -= 0x0001; 
// lsw_value must be in the range of -32768 ... 32767!

int msw_value = (full_value >> 16) & 0x;
if (msw_value & 0x8000)
  msw_value -= 0x0001;
// msw_value must be in the range of -32768 ... 32767!
 
operands[2] = GEN_INT(lsw_value);
operands[3] = GEN_INT(msw_value);

.
Be aware that your immediate operand might be a label reference so that INTVAL 
will fail on it since it is not known at compile time. The only workaround I 
have found so far for treating this case of label reference immediates is: do 
not to split the instruction but to keep the unsplittet insn until 
instruction emission.

You also probably will whish to handle the cases explitly that either 
msw_value or lsw_value happen to be zero so that you will not use the 
standard template but emit the RTL by yourself (for examples you might look 
at my patch for AVR).

Yours,

Björn


Re: My opinions on tree-level and RTL-level optimization

2005-04-17 Thread Björn Haase
Hi,

while lacking the deep insight into GCC internals most of you have, I'd never 
the less like to ask you to be very prudent concerning the issue of removal 
of seemingly unnecessary RTL optimizations.
In contrast to 32 bit targets, for 8 and 16 bit targets the RTL representation 
possibly might look completely different than the corresponding tree 
representation of the code: 
In my opinion, now that the new tree optimizations exist, it might finally be 
a good approach to let all the optimizations that could be done on original 
DI/SI/HI mode expressions be done on the tree level already. Then one could 
expand the DI/SI/HI mode expressions at RTL generation to only refer to the 
processor's native word length modes. What one would get then is a RTL which 
has about nothing to do with the corresponding tree representation.

For 8/16 bit target and gcc3 it seems that it used to be necessary to do the 
expansion/splitting extremely late since optimizations on SI/HI mode 
expressions required to keep SI/HI mode objects at least until reload. Now 
hopefully one could consider to do the splitting much earlier and with the 
help of the existing RTL optimizers one might be able to find many additional 
optimization possibilities. 

I think that it would be a pity if one could no longer find these 
optimizations because the corresponding RTL optimizers have been removed.

Yours,

Björn


Re: My opinions on tree-level and RTL-level optimization

2005-04-17 Thread Björn Haase
Am Sonntag, 17. April 2005 16:26 schrieb Daniel Jacobowitz:
> On Sun, Apr 17, 2005 at 03:19:43PM +0200, Björn Haase wrote:
> > Hi,
> >
> > while lacking the deep insight into GCC internals most of you have, I'd
> > never the less like to ask you to be very prudent concerning the issue of
> > removal of seemingly unnecessary RTL optimizations.
> > In contrast to 32 bit targets, for 8 and 16 bit targets the RTL
> > representation possibly might look completely different than the
> > corresponding tree representation of the code:
> > In my opinion, now that the new tree optimizations exist, it might
> > finally be a good approach to let all the optimizations that could be
> > done on original DI/SI/HI mode expressions be done on the tree level
> > already. Then one could expand the DI/SI/HI mode expressions at RTL
> > generation to only refer to the processor's native word length modes.
> > What one would get then is a RTL which has about nothing to do with the
> > corresponding tree representation.
>
> This does not conflict with removing RTL optimizers.  Right now, the
> most natural time to do this sort of lowering is at expand; but there's
> no fundamental reason why it could not be done on trees, just before
> expand, and rerun relevant tree optimizers after doing so.  Same as the
> issues for "long long" splitting that Roger mentioned.
I agree that this would be also a solution and one would probably do it this 
way if one rewrote gcc from scratch. Only drawback that I see is, that this 
way someone who wishes to write a new back-end would then be required to 
learn *two* intermediate representations simultaneously. (As a bloody 
beginner, I'd like to say that even understanding how to work with RTL 
sometimes is not *really* intuitive :-) ...) 

\begin{offtopic}
BTW. You have mentioned the possibility to re-run a couple of optimization 
passes later on. 

I am presently facing a situation (AVR) where condition code issues force us 
to continuing to split a couple of patterns only after reload and I think 
that it would be helpful to re-run some passes after having done the 
splitting. E.g. I am having sequences like

(set (subreg:QI (reg:HI 12) 0) (__some_memory_reference__))
(set (subreg:QI (reg:HI 12) 1) (const_int 0))
(set (subreg:QI (reg:HI 20) 0) (ior:QI (subreg:QI (reg:HI 20) 0) (subreg:QI 
(reg:HI 12) 0)))
(set (subreg:QI (reg:HI 20) 1) (ior:QI (subreg:QI (reg:HI 20) 1) (subreg:QI 
(reg:HI 12) 1)))

steming, e.g., from an ior of a zero-extended QI value in r12:r13 with a HI 
value in r20:r21. While I see that it is hopless to tell GCC after reload 
that it could have avoided allocating r13, I at least have the hope that the 
second or-operation (on r13 and r21) and the storing of 0 into r13 could be 
optimized away if a pass is re-run that is usually executed *before* reload 
only.

I'd appreciate any hint on which place in the mid-end (i.e. which source file) 
I might want to look at ...
\end{offtopic}

Yours,

Björn


Re: internal compiler error at dwarf2out.c:8362

2005-04-17 Thread Björn Haase
James E Wilson wrote

>You shouldn't be trying to build your own types in a machine dependent 
>attribute handler function. The compiler's type system is determined by 
>front-ends mainly, and some middle-end infrastructure, and isn't your domain 
>to mess with. This stuff is subject to change, at which point your code may 
>break.

This seems to be a general issue for embedded targets. It's not only HC05 that 
would benefit of the feature to have different address spaces (e.g. 
eeprom/ram/rom) since these possibly need different asm instructions to be 
accessed. 
In case that one should not use machine specific atttributes, *is* there a 
standard way for GCC how to implement different address spaces?

Yours,

Björn


Re: internal compiler error at dwarf2out.c:8362

2005-04-19 Thread Björn Haase
Am Dienstag, 19. April 2005 00:30 schrieb James E Wilson:
> Björn Haase wrote:
> > In case that one should not use machine specific atttributes, *is* there
> > a standard way for GCC how to implement different address spaces?
>
> Use section attributes to force functions/variables into different
> sections, and then use linker scripts to place different sections into
> different address spaces.  You can define machine dependent attributes
> as short-hand for a section attribute, and presumably the eeprom
> attribute is an example of that.
>
> The only thing wrong with the eeprom attribute is that it is trying to
> create its own types.  It is not necessary to create new types in order
> to get variables placed into special sections.  There is nothing wrong
> with the concept of having an eeprom attribute.
Hi,

I am aware of the possibility to use section attributes for giving the linker 
information on where to place which kind of object. The difficulty is, that 
for a couple of targets like AVR (and I think that this EEPROM issue for the 
HC05 it is very much the same problem) you will be required to use a 
completely different set of assembler instructions for accessing different 
regions of memory. So it is not sufficient to simply give the linker the 
information where to place which object. 

To give you an example: In case that you are trying to access read-only 
program memory on avr, you could only access it by use of a single pointer 
register (Z reg.) and the only addressing modes that are available are 
register direct and register direct with pre decrement and post increment.

In case that you are accessing r/w data memory, you could make use of 3 
different pointer registers (X,Y,Z) and there are also instruction with 
immediate memory addresses and register direct addressing with offset as well 
as a couple of instructions that could directly reference memory (bit tests, 
e.g.) .

The problem therefore is that the compiler itself would need to know which 
type of memory reference he presently is working on in order to know which 
kind of instruction set will be functional. So, I think what would be needed 
is something that is reflected in the type system. IIUC, this is the 
background of the change in the type system that the previous message in the 
thread is about. 
What would be best is some kind of "sticky flag" that is carried around with 
every tree node or RTL expression that stems from a memory reference that 
once has been marked by a particular attribute.

Yours,

Björn


Unnesting of nested subreg expressions

2005-04-19 Thread Björn Haase
Hi,

when working on removing avr's present monolithic SI-mode instruction patterns 
by splitters after reload and lowering to QI modes after expand, I have 
stepped over the following general issue:

The mid-end seems not to be able to simplify nested subreg expressions. I.e. 
it seems that there is no known transformation 

   (subreg:QI (subreg:HI (reg:SI xx) 0) 0) 
-> (subreg:QI (reg:SI xx) 0)

. I have stepped over the problem when replacing the avr-target's present 
xorsi3 define_insn by a corresponding define_expand explicitly using 4 
subregs, i.e. after replacing

(define_insn "xorhi3"
  [(set (match_operand:HI 0 "register_operand" "=r")
(xor:SI (match_operand:HI 1 "register_operand" "%0")
(match_operand:HI 2 "register_operand" "r")))]
  ""
  "eor %0,%2
eor %B0,%B2"
  [(set_attr "length" "2")
   (set_attr "cc" "set_n")])

by

(define_expand "xorhi3"
 [(set (subreg:QI (match_operand:HI 0 "register_operand" "=r") 0)
   (xor:QI (subreg:QI (match_operand:HI 1 "register_operand" "%0") 0)
   (subreg:QI (match_operand:HI 2 "register_operand" "r")  0)))
  (set (subreg:QI (match_dup 0) 1)
   (xor:QI (subreg:QI (match_dup 1) 1)
   (subreg:QI (match_dup 2) 1)))]
  ""
  "")
 
So far I had seen no regressions on the testsuite, however after adapting the 
testcase gcc.c-torture/execute/200406029-1.c to compile also on int=16bits 
targets, I am now getting an ICE. The error message reads:

/home/bmh/gnucvs/head/gcc/gcc/testsuite/gcc.c-torture/execute/20040629-1.c:139: 
error: unrecognizable insn:
(insn 28 27 29 0 (set (subreg:QI (reg:HI 59) 0)
(xor:QI (subreg:QI (reg:HI 42) 0)
(subreg:QI (subreg:HI (reg/v:SI 41 [ x ]) 0) 0))) -1 
(insn_list:REG_DEP_TRUE 68 (insn_list:REG_DEP_TRUE 3 (insn_list:REG_DEP_TRUE 
27 (nil
(nil))
/home/bmh/gnucvs/head/gcc/gcc/testsuite/gcc.c-torture/execute/20040629-1.c:139: 
internal compiler error: in extract_insn, at recog.c:2082
Please submit a full bug report,
with preprocessed source if appropriate.

The 20040629-1.c's bitfield operations generate HI mode subregs of SI mode 
registers and these HI mode subregs are themselves passed to the HI->QI mode 
expander. My question therefore is: 

It seems that the cleanest solution would be to teach gcc how to unnest 
subregs. Therefore my question: Is this possible and where would be the place 
for doing this?
 
Yours,

Björn


BTW. I have stepped over a similar issue when using the gen_highpart and 
gen_lowpart functions for splitters after reload. It sometimes happens that 
one of these functions also gets a subreg expression as input operand while 
not being able to handle it. Both functions seem to fail as well when they 
are working on a label reference immediate operand. It seems that in their 
present form gen_lowpart and gen_highpart should be used only in DI-SI-mode 
splitters since then there is no danger that the DI mode expression itself is 
a subreg of an even larger mode.


Re: Unnesting of nested subreg expressions

2005-04-21 Thread Björn Haase
Thank you for your answer!

Am Donnerstag, 21. April 2005 05:11 schrieb James E Wilson:
> Björn Haase wrote:
> > The mid-end seems not to be able to simplify nested subreg expressions.
> > I.e. it seems that there is no known transformation
> >(subreg:QI (subreg:HI (reg:SI xx) 0) 0)
>
> Nested subregs aren't valid.  You should refrain from creating them.
OK, understood now. So it's up to the back-ends that they prevent that such 
expression is ever emitted, both in define_expand and define_split.

> > (define_expand "xorhi3"
> >  [(set (subreg:QI (match_operand:HI 0 "register_operand" "=r") 0)
> >(xor:QI (subreg:QI (match_operand:HI 1 "register_operand" "%0") 0)
> >(subreg:QI (match_operand:HI 2 "register_operand" "r") 
> > 0))) (set (subreg:QI (match_dup 0) 1)
> >(xor:QI (subreg:QI (match_dup 1) 1)
> >(subreg:QI (match_dup 2) 1)))]
>
> If you have 16-bit registers, then I don't think there is any way to
> make this work as written.  Intra-register high-part subregs aren't
> generally valid either.  A high-part subreg is generally only valid when
> it is an entire-register of a multi-register value.

For AVR, we indeed have *only* 8 bit registers, so the template above works 
quite well as long as operands[2] does not accidently happen to be a subreg 
expression.
The best solution I am seeing now probably then is, to avoid using a 
subreg-type template for generating the low- and high parts of operands[2] 
but use special functions for generating the high- and low parts of 
operands[2]. For this purpose, one probably would consider extending the 
existing gen_highpart and gen_lowpart functions so that they are also able to 
work on subreg input operands.

>
> You will have to use some other kind of rtl here, such as shift and
> masks, or zero_extract.
>
This might be true for >=16 bit targets. For AVR, the only difficulty I have 
been experiencing with above expand-pattern is the case of subreg input 
operands.

> > It seems that the cleanest solution would be to teach gcc how to unnest
> > subregs. Therefore my question: Is this possible and where would be the
> > place for doing this?
>
> Or you can fix your expander to stop creating nested subregs.  That is
> proabably much simpler than trying to teach the rest of the compiler to
> accept them.
> You can't rely on the fact that any expanded rtl will get simplified, so
> if we allow nested subregs, then everyplace that handles them needs to
> accept them, and that means an awful lot of code will have to change.
>

OK understood.

> > BTW. I have stepped over a similar issue when using the gen_highpart and
> > gen_lowpart functions for splitters after reload.
>
> I can't comment without details.
>
> > are working on a label reference immediate operand. It seems that in
> > their present form gen_lowpart and gen_highpart should be used only in
> > DI-SI-mode splitters since then there is no danger that the DI mode
> > expression itself is a subreg of an even larger mode.
>
> Except that the DImode expression could be a paradoxical subreg of a
> smaller mode, in which case you might have similar problems.

So far, I now think that the solution of the issue would be to extend 
gen_lowpart and gen_highpart so that they are able to handle also subreg 
inputs and use them at all places that emmit RTL (i.e. expand and split).
Question is whether I should try to simultaneously implement support for the 
case of paradoxial subregs as input operand, once I am working on the code?

(I had been hoping that, there is a possibility to implement the HI->QI and 
SI->QI lowering by restricting the changes to the back-end's config 
directory. But, ok, it seems that one would have to touch at least some of 
the support functions ...)

Thank you again for your comment. Very important for a beginner to have such 
support.

Yours,

Björn


Re: different address spaces (was Re: internal compiler error at dwarf2out.c:8362)

2005-04-21 Thread Björn Haase
Martin,

I think that the AVR people would very much appreciate if you would report 
occasionally on your progresses concerning your realization of the different 
address space issue on your personal HC05 port. (In my opionion, the lack for 
support of different memory spaces is the key weakness of the present avr 
port and I think that any avr-gcc user would be very happy if this weakness 
could be removed some day.)

While I think, that I now have quite a clear picture on which kind of 
modifications would be required for the back-end (namely more sophisticated 
define_expand patterns and possibly some new unspecs (and corresponding 
define_insn/define_split patterns) for EEPROM and PMEM  ), I don't have any 
idea on how complicated the work on the tree level would be and of course, 
whether your approach generally would be suitable for being included in 
mainline gcc.

Yours,

Björn


Re: Unnesting of nested subreg expressions

2005-04-22 Thread Björn Haase

James E Wilson wrote:
> You were never very clear about what was wrong with gen_highpart and
> gen_lowpart with respect to subregs.  rtl examples are always helpful,
> e.g. showing RTL input and RTL output and pointing out what is wrong.
> gen_lowpart already has support for subreg input and presumably should
> work.
>
> gen_lowpart is already pretty involved.  If you need something other
> than a trivial fix, it might be better to try to solve the problem in
> your md file.
Ok, Indeed I had not been very explicit. So here is the story

1.) Background
avr has *only* qi mode arithmetic operations and needs to split/expand any 
logic or arithmetic operation to individual qi mode operation. Problem of the 
present port is that it pretends presence of HI mode and SI mode operations 
by providing insn like andsi3. The splitting to QI mode operations presently 
is never visible in the RTL and done extremely late at assembly output. Gcc 
misses therefore a lot of optimizations. I am presently trying to remove the 
define_insn for the HImode and SImode operations by implementing the lowering 
to QImode at expand or at least after reload (for those operations that leave 
the condition code in a useable state like addsi3 ).

2.) Where the problem with nested subregs shows up:

One example of one pattern that I have written for implementing, e.g. andhi3 
reads:

(define_insn_and_split "andhi3"
  [(set (match_operand:HI 0 "register_operand" "=r,d")
  (and:HI (match_operand:HI 1 "register_operand" "%0,0")
  (match_operand:HI 2 "nonmemory_operand" "r,i"))) ]
  ""
  "*{
  /* Keep insn until asm output for handling the case of label refs. */
  if (which_alternative==1)
{
   return (AS2 (andi,%A0,lo8(%2)) CR_TAB
   AS2 (andi,%B0,hi8(%2)));
 }
  return "bug"; 
}"
  "reload_completed 
   && ((GET_CODE(operands[2]) == CONST_INT) || REG_P(operands[2]))"
  [(set (subreg:QI (match_dup 0) 0)
(and:QI (subreg:QI (match_dup 1) 0)
(subreg:QI (match_dup 2) 0)))
   (set (subreg:QI (match_dup 0) 1)
(and:QI (subreg:QI (match_dup 1) 1)
(subreg:QI (match_dup 2) 1)))]
  "if (GET_CODE(operands[2]) == CONST_INT)
 { 
   /* If operands[2] is a register, use the template above. */ 
   /* In case of const ints emmit rtl ourselves for optimizing */
   /*  away andi reg,0xff and replace andi reg,0 by ldi reg,0. */
...

I first thought that the second alternative for operands[2] would always 
require that operands[2] itself is a register. It seems, however that there 
still remain some, admittedly rare, occasions, where operands[2] itself is a 
(subreg:HI (reg:SI ) 0). In these cases (subreg:QI ( match_dup 2) 0) 
generates one of the nested subreg expressions.

So I thought about using helper functions and change the templates to 
something similar to
  [(set (subreg:QI (match_dup 0) 0)
(and:QI (subreg:QI (match_dup 1) 0)
 (match_dup 3)))
   (set (subreg:QI (match_dup 0) 1)
(and:QI (subreg:QI (match_dup 1) 1)
 (match_dup 4)))]
with

operands[3] = gen_lowpart (QImode,operands[2]);
operands[4] = gen_highpart (QImode,operands[2]);

. It seems, however, that this second option does not solve the problem. So 
what I am looking for is a function that takes operands[2] for patterns like 
the one above and returns the appropriate subreg RTL that treats both, the 
case that operands[2] is a regular HImode register and the case that 
operands[2] happens to be a subreg:HI of an even wider mode.

I have tried to find out whether such a function already exists in the 
existing code. Unfortunately, it's a bit hard to guess solely from the 
comments for the functions, e.g. in emit-rtl.c, what the different variants 
of gen_highpart and gen_lowpart, gen_subreg, simplify_subreg, etc. are meant 
to be used for.

Yours,

Björn


RFC: May be used by testcases of the C-testsuite?

2005-05-05 Thread Björn Haase
Hi,

brief question:

Would it be ok to make use of inttypes.h in testcases of the C-testsuite? 
By replacing, e.g.,  "int" by "int32_t" where necessary, one could compensate 
for the implicit assumption of a hand full of testcases that int is 32 bits 
so that they work also for targets where int is 16 bits wide.

IMO doing this would be better than totally skipping these test cases for 
targets where int is 16 bits. (This way, I , e.g., already have found a bug 
in a patch for the avr port that I have committed recently!)

Comments would be appreciated.

Yours,

BjÃrn


RFD: Is there a helper function like "print_rtx_to_stdout" ?

2005-05-06 Thread Björn Haase
Hi,

I am facing a situation where a gcc_assert call that checks for some 
properties of a rtx expression, say "op", triggers an ICE (see below). I'd 
like to have a look the rtx that triggers this error. For this reason, I'd 
like to know whether there exists a helper function for writing to stdout 
which kind of rtx "op" actually is. I.e. some function like 
"print_rtx_to_stdout" or "print_rtx_to_file" that could be used when 
debugging the compiler.

Thank's in advance.

Björn

Sample code:

rtx
simplify_subreg (enum machine_mode outermode, rtx op,
 enum machine_mode innermode, unsigned int byte)
{
  /* Little bit of sanity checking.  */
  gcc_assert (GET_MODE (op) == innermode || GET_MODE (op) == VOIDmode);

  ...


RFC: (use) useful for avoiding unnecessary compare instructions during cc0->CCmode ?!

2005-05-14 Thread Björn Haase
Hi,

I have thinking about how to overcome part of the "double-setter" difficulties 
that arise when implementing cc0->CCmode conversion for a couple of targets:

IIUC correctly one of the two or three difficulties with cc0->CCmode
conversion is, that combine in it's present form is not able to
recognize that in a sequence like

(parallel[
(set reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102))
(set reg:CC_xxx CC) (compare (reg:SI 101) (reg:SI 102)))])
((set reg:CC_xxx CC) (compare (reg:SI 101) (reg:SI 102)))

the compare instruction could be avoided. IIUC it is presently suggested to 
make use of *many* peephole2 patterns for solving this problem.

IIUC the key reason why present combine would miss the oportunity is that 
many optimizations are not possible for "double-set" instructions.

In order to overcome this problem, it seems to be a possible approach to
expand double-set instructions that, e.g., leave the condition code in 
a useful state to sequences containing "announce" instructions. I.e. expand
would insert two instructions after the double-set instruction that contain
the two individual sets and an additional "use" statement. I.e. above
sequence after expand then would look like

(parallel[
(set reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102))
(set reg:CC CC_xxx) (compare (reg:SI 101) (reg:SI 102)))])
(parallel[ (set reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102))
   (use (reg:SI 100) ])
(parallel[ (set reg:CC_xxx CC) (compare (reg:SI 101) (reg:SI 102)))
   (use (reg:CC])
((set reg:CC_xx CC) (compare (reg:SI 101) (reg:SI 102)))
 
. The patterns for the "announce" instructions would never generate any 
assembly output and would have length "0". They are merely used for 
communicating to the mid-end that a double-set instruction has a possibly
useful side-product.

When inserting such "announce" instructions, I think that even the present 
combiner will be able to realize that the second compare instruction is not 
necessary.

I have confirmed with another example of double-set instructions (divmodsi4 
for the AVR target) that above approach indeed works. It seems, however, that
what one is using is kind of an undocumented side-effect that could be 
subject to change in future releases. I.e. the documentation on (use) 
presently "warns" that some optimization *could* take place when using the 
(use) statements, resulting in bugs when using (use) in a wrong way.
I think that it would be helpful to make this partially documented behaviour 
of the (use) clauses an official feature. I.e. I think that it would help to 
agree on that the optimizations the present documentation "warns" of, "are 
supposed" take place. (Maybe this is already commonly agreed on.?)
 
Comments?

Yours,

Björn


Re: RFC: (use) useful for avoiding unnecessary compare instructions during cc0->CCmode ?!

2005-05-14 Thread Björn Haase
Am Samstag, 14. Mai 2005 21:39 schrieb Alexandre Oliva:
> On May 14, 2005, BjÃrn Haase <[EMAIL PROTECTED]> wrote:
> > I.e. expand
> > would insert two instructions after the double-set instruction that
> > contain the two individual sets and an additional "use" statement. I.e.
> > above sequence after expand then would look like
> >
> > (parallel[
> > (set reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102))
> > (set reg:CC CC_xxx) (compare (reg:SI 101) (reg:SI 102)))])
> > (parallel[ (set reg:SI 100) (minus:SI (reg:SI 101) (reg:SI 102))
> >(use (reg:SI 100) ])
> > (parallel[ (set reg:CC_xxx CC) (compare (reg:SI 101) (reg:SI 102)))
> >(use (reg:CC])
>
> You'd then have to some how arrange for the second and third insns to
> not be removed as redundant, and come up with some additional work
> around for the case when there's an overlap between output and input.
Yes. I agree with both of your remarks. 

Concerning removal of the redundant expressions: It seems that the fact that 
condition codes could be reused is identified already during CSE or GCSE, and 
at this time the passes removing dead code seem not to have removed the 
redundant patterns. At least, since this worked for the case of "divmod", I 
am having hope that this would work as well for condition codes.

Concerning your second remark: Indeed one probably would need to rewrite the 
third pattern as 
 (parallel[ (set reg:CC_xxx CC) (compare (reg:SI 100) (const_int 0)))
(use (reg:CC])
. For the divmod issue, I have observed a very similar difficulty (there 
however there is no problem with overlapping in- and outputs due to a couple 
of additional moves, but the problem there is that the register allocator 
falsely assumes that the input register operands still need to be alive for 
the announcement instruction and this prevents some optimizations.)

What might be a bit hairy is to make sure that the real compare instructions 
(i.e. the possibly "unnecessary" compares I am aiming to remove) that are 
emitted at expand result in RTL that match as well as possible the 
"announcement" patterns that are generated by the arithmetic/logic 
operations.
Anyway, I have hope that by using this method one would at least be able to 
handle condition code reuse for code segments like

int a,b,c;

c = a - b;
if (c > 0)
{ 
   // do something
 ;
}

without requiring to use peepholes.

Yours,

BjÃrn


Re: RFC: Strategy for cc0 -> CCmode conversion for the AVR target.

2005-06-05 Thread Björn Haase
Thank's for your response,

Sunday, 5. Juni 2005 04:16 Ian Lance Taylor wrote:
> > The condition-code re-use issue is the point, where, IMO, the link to the
> > subreg-lowering 2.) shows up. After, e.g., breaking down a HI mode "sub"
> > operation into two QI mode "sub" and "sub-with-carry"s at expand, I
> > consider it to be extremely difficult to make the mid-end smart enough to
> > identify that a the end of the QI "sub-with-carry" the condition code is
> > set according to the corresponding HImode substract operation. For DImode
> > operations the mid-end would already need to take 8 (!) Instructions into
> > account for finding out what the calculated condition code actually
> > represents. This, also, will be a major difficulty when considering Ian's
> > suggested optimizer pass after reload.
>
> I agree that there is a problem here, but it's not clear to me how you
> can address it under the current cc0 scheme either.
I agree. That problem is existing as well with the cc0 scheme. What I was 
aiming to suggest is: When it's necessary to work on the CC re-use for the 
cc0->CCmode transition anyway, let's try to use a path that makes it possible 
to switch to splitting after expand without facing too much problems.

> > ; Additional "Marker" instructions to be used by GCSE
> > (parallel [
> >  (use (reg:CC_cc SR))
> >  (set (reg:CC_cc SR) (compare:HI (operands[0]) (const_0))
> >  (note "please delete the entire embracing parallel instruction before
> > register life-time analysis by a new pass: It pretends to use operand[0]
> > while in fact this instruction does nothing except from giving hints to
> > GCSE.")
> > ])
> > (parallel [
> >  (use (operands[0]))
> >  (set (operands[0]) (minus:HI (operands[1]) (operands[2]))
> >  (note "please delete the entire embracing parallel instruction before
> > register life-time analysis by a new pass: It pretends to use operands 1
> > and 2 while in fact this instruction does nothing except from giving
> > hints to GCSE.")
> > ])
>
> I'm not crazy about these marker instructions personally.  They are
> describing something which I think could be handled via parallel sets
> or register notes.  
The advantage of using actual instructions that I see is, that already the 
present implementation of the (G)CSE is able to work with the way the 
information is stored in the marker instruction. And it's, IMO, one possible 
method for carrying around the information what is actually contained in the 
zoo of subregs. One would have a bit freedom to choose at which time the 
information concerning the larger modes is discarded.

The reason why (IMO) it is not possible to realize the same functionality with 
"double set" instructions is, that the condition code generally depends on 
*all* of the involved subregs. It will be set, however, only by the last 
instruction of the sequence and this one works only on some *sub*-set of the 
input operands. I.e. in the example the later "substract with carry" 
operation (the one working on the most significant byte) would have to 
pretend to use both, the least significant bytes and the most significant 
bytes if it included a parallel set for writing the condition code of the 
entire sequence. The double-set instruction would therefore include an 
incorrect dependency. This will cause problems, e.g. when the input and 
output operands of the earlier instructions overlap: Gcc would be forced to 
hold an unchanged copy of the overwritten input operands in additional 
registers since it assumes that the last instruction needs them.

> The more serious problem I see is that if part of 
> the subtraction disappears for some reason, the information, however
> stored, will be incorrect.  How can that problem be avoided?

IIUC, the existence of the use statement in the marker instruction prevents 
that the registers written by the subtraction ever may be marked as "dead" 
unless the marker instruction itself is removed. I so far have expected that 
no pass would be allowed to remove an instruction if a later instruction 
depends on it's result, independent of whether the dependence stems from a 
(use) statement or the use of the value in a conventional arithmetic or logic 
operation.
Maybe I understood something wrong, but so far I do not see a problem that one 
would need to avoid.

Yours,

Björn


P.S.: 
As a detail: I have just realized that the first one of the three marker 
instructions (the one setting CC according to the compare of operands[1] and 
operands[2]) probably would have to be placed in front of the two 
instructions writing (operands[0]). operands[0] and operands[1] possibly 
could overlap.


Re: Expanding an ADDSI3 into 2 x ADDHI3 does not work

2005-06-16 Thread Björn Haase
> If I use:
> (define_expand "addsi"
>  [(set (match_operand:SI 0 "general_operand" "=g")
>(plus:SI (match_operand:SI 1 "general_operand" "g")
> (match_operand:SI 2 "general_operand" "g")))]
>   ""
>   "{
>   emit_insn (gen_addhi3 (custom_subword(operands[0], 0, SImode),
>
> custom_subword(operands[1], 0, SImode),
>
> custom_subword(operands[2], 0, SImode)));
>   emit_insn (gen_addhi3 (custom_subword(operands[0], 1, SImode),
>
> custom_subword(operands[1], 1, SImode),
>
> custom_subword(operands[2], 1, SImode)));
>   DONE;
>}" )
>
> the output becomes a mess of addqi, cmpqi, and branches.
Would it help to use (define_expand "addsi3" ...) instead of (define_expand 
"addsi" ...) ?

Yours,

Björn


RFH: intra-procedure optimizations "CALL_REALLY_USED_REGISTERS"

2005-08-07 Thread Björn Haase
Hello,

The avr port presently misses possible intra-procedure optimizations 
concerning register use. Optimizations are missed when 1) calling a leaf 
function that is 2) defined in the same unit as a the caller and 3) clobbers 
only a subset of the call-clobbered registers. Presently I observe that the 
caller still saves all of the "CALL_REALLY_USED_REGISTERS" whether they are 
*really* used or not.

Is there a way to make leaf functions to be compiled first so that when 
starting with non-leaf functions in the same unit expand could insert 
detailed information on which subset of registers is actually clobbered by 
calls to leaf functions?

When reading the comments in cgraph.c for the cgraph_optimize function 
"Backend can then use this information to modify calling conventions" it 
seems to me that there is an interface already for modifying the calling 
convention. Is there a port that uses it so that one could find out what the 
interface looks like?

Yours,

Bjoern Haase


Re: How can I build gcc on my Windows PC?

2005-08-07 Thread Björn Haase
David Nowak wrote
>Do I need a c compiler to build gcc on my Windows PC?  If so, where
>can I get one?  I downloaded both MinGW and Cygwin, but neither seems
>to have a c compiler.  Please help me.  Thank you.

Cygwin *includes* working gcc binaries. Probably you simply missed to choose 
the right checkboxes during installation.

Yours,

Björn Haase

BTW: Maybe your question is probably better posted on the gcc-help list.


Re: RFH: _inter_-procedure optimizations "CALL_REALLY_USED_REGISTERS"

2005-08-07 Thread Björn Haase
Steven Bosscher wrote on Sonntag, 7. August 2005 12:45 :
> On Sunday 07 August 2005 09:35, Björn Haase wrote:
> > Hello,
> >
> > The avr port presently misses possible intra-procedure optimizations
> > concerning register use.
>
> What you describe is an _inter_procedural optimization.  "Between
> procedures".  You want to use the result of some analysis done in one
> function to expose extra optimization opportunities in another function.
> That is interprocedural.
OK, thank's :-).

> > Is there a way to make leaf functions to be compiled first
>
> Leaf functions _are_ compiled first.
>
> > so that when
> > starting with non-leaf functions in the same unit expand could insert
> > detailed information on which subset of registers is actually clobbered
> > by calls to leaf functions?
>
> Not at the moment.  It is probably not very hard to implement something
> like this idea, though.  Note that you can even propagate this detailed
> information across the call graph if you could compute it.

I think that such kind of optimization could help very much for my fractional 
numerics library where some functions tend to be lengthy but use very few 
registers. I would be willing to try to implement it.

I can imagine two possible approaches for addressing the issue. One that only 
needs adaptions in the back-end and implements the optimization only for one 
target and then one more generic solution.


1.) target specific approach

The back end implements a database that stores the register usage of all the 
functions that so far have been compiled and the symbol ref of these 
functions. IIUC this could be done in the final stage when generating the asm 
text output.

The define_insn for the named call pattern is replaced by a smart 
define_expand that looks in the database whether anything on the 
function-to-be-called is known. It could look whether the database already 
contains an entry with the same label ref. 
If not, it generates the usual call pattern. If there *is* an entry in the 
database it generates a call_insn where the CALL_INSN_FUNCTION_USAGE contains 
the appropriate use and clobber RTL.

2.) generic

One could of course try to store the information on register usage somewhere 
in the tree structure. 
One therefore would add a target macro CALL_FUNCTION_CLOBBERED_REGISTERS that 
returns a pointer to static memory of a struct of the same type as the 
"CALL_REALLY_USED_REGISTERS" struct with the only difference that this struct 
could change. If the value returned is not NULL, the mid end then would copy 
this struct to a new location on the heap and store a link to this storage 
somewhere in the tree data structure.

One would then make the default expander for calls do the work to adapt the 
CALL_INSN_FUNCTION_USAGE field if information is available. Probably one 
would then not make use the name of the lable ref of the function entry 
lable. The default expander probably would have a pointer to some tree node 
that describes the function to be called.?


I think that I am having a clear idea on how much work it would be to 
implement 1). IMO, it would not be too much work (only open question would 
be, where to place the code that finds out which register are used).

However, I'd prefer method 2) since I think that other targets might as well 
benefit from the optimization. I only cannot judge how much work it would be 
to get the additional information stored within the tree.

I'd appreciate opinions on whether there stand chances to get approach 2) 
integrated into gcc mainline. I also would be thankful for a short hint where 
the function is that would have to use the suggested 
CALL_FUNCTION_CLOBBERED_REGISTERS macro and where within the mid-end
I would find the default expander for insn calls (I thought it is in optabs.c 
but I did not find it immediately).


Greetings,

Bjoern Haase


Re: RFH: _inter_-procedure optimizations "CALL_REALLY_USED_REGISTERS"

2005-08-09 Thread Björn Haase
Steven Bosscher wrote:
> In
> any case, you should assume that it is a much bigger job than just
> modifying the call expander.

Ok, I had a closer look at what is happening in present state gcc and I 
understand that it is indeed a much more complex task than I first thought. 
One Issue would be also, that it probably would be worth considering which 
other kind of information (except register clobbering information) on a 
static leaf function would be worth to be shared with the caller:

memory usage of the called function / memory clobbers, information on whether 
it's a pure function ?

Probably once one starts to try to implement kind of a database containing 
information on all the functions that are already compiled, one might make 
efforts to support all of the most useful information. Of course there always 
will be trade-offs concerning compilation time and memory consumption. 

I still would like to try my very best to look after this, but I'd prefer to 
start by scanning literature. If anyone could help me out recommending good 
articles / textbooks dealing with inter-function optimizations. It would be 
appreciated :-).

Yours,

Björn


RFC: Appropriate method for target-specific mode-substititutes in libgcc2

2005-02-14 Thread Björn Haase
Hi,

I'd like to discuss with you a topic related to a recent bootstrap failure of 
a couple of smaller embedded targets. The origin of the failure could be 
removed easily. In my opinion the question is simply, what is the best way to 
implement it?

Root of the problem is that libgcc2 presently does not consider target 
dependencies properly for smaller targets: Namely, for several smaller 
targets double is defined to be the same as float, or there possibly might be 
no support for long longs (>= 64 bits). Since libgcc2 explicitly refers to 
DI, DF or DC modes presently, gcc runs into problems when it tries to compile 
functions that use modes that are not implemented by the back-ends.

My suggestion in order to deal with this issue is to

change libgcc2.h such that it includes the target specific tm.h .
Establish a convention that, when the target machine header file defines 
symbols with the standard names

#define TARGET_SPECIFIC_SUBSTITUE_FOR_MODE_DI SI
#define TARGET_SPECIFIC_SUBSTITUE_FOR_MODE_TI DI
#define TARGET_SPECIFIC_SUBSTITUE_FOR_MODE_XF SF
#define TARGET_SPECIFIC_SUBSTITUE_FOR_MODE_DF SF
#define TARGET_SPECIFIC_SUBSTITUE_FOR_MODE_DC SC

, the value that these macros expand to should substitute the definitions of 
e.g.

typedef float   DFtype  __attribute__ ((mode (DF)));

Namely, I'd suggest to change libgcc2.h such that above line would read

#ifndef TARGET_SPECIFIC_SUBSTITUE_FOR_MODE_DF
typedef float   DFtype  __attribute__ ((mode (DF)));
#else
typedef float   DFtype  
   __attribute__ ((mode (TARGET_SPECIFIC_SUBSTITUE_FOR_MODE_DF)));
#endif

I would be willing to implement it and present a patch for detailed 
discussion. However prior to starting, I'd appreciate comments. 
Looking forward to hearing from you.

Yours,

BJörn


Re: RFC: Appropriate method for target-specific mode-substititutes in libgcc2

2005-02-17 Thread Björn Haase
Am Donnerstag, 17. Februar 2005 03:11 schrieb James E Wilson:
> James E Wilson wrote:
> > Björn Haase wrote:
> >> #ifndef TARGET_SPECIFIC_SUBSTITUE_FOR_MODE_DF
>
> I see now that this is PR 19920.  This message would have made more
> sense if you had included that important info.
>
> Anyways, I see that Richard Henderson has added a reasonable fix for it
> now along the lines I suggested, e.g. adding ifdefs to control when the
> bigger modes are used.
Indeed with Richard Henderson's patches the problem for the h8 and the avr 
targets no longer exists. Never the less thank you for your answer. I'll be 
more explicit to the exact PR number next time.

Yours,

Björn