from:"Joern Rennecke"

defunct fortran built by default for cross-compiler

2006-11-01 Thread Joern RENNECKE

When I configure an sh-elf cross tool chain without a specific 
--enable-languages

option (which used to work with gcc 4.2), I get:
The following languages will be built: c,c++,fortran,java,objc
*** This configuration is not supported in the following subdirectories:
target-libmudflap target-libgomp target-libgloss target-libffi 
target-zlib

target-libjava target-libada gnattools target-boehm-gc
   (Any other directories should still work fine.)

With literally more than ten thousand lines of error messages per 
multilib for

fortran, that makes the test results unreportable.

Re: defunct fortran built by default for cross-compiler

2006-11-01 Thread Joern RENNECKE


Steven Bosscher wrote:

 
So you don't report any error messages at all and leave us guessing?
 


AFAIK fortran is not supposed to be configured at all for a cross-compiler.
Or has that changed recently?  It was certainly not configured in my 
previous

builds of gcc 4.2 snapshots.

Re: defunct fortran built by default for cross-compiler

2006-11-06 Thread Joern RENNECKE


Joern Rennecke wrote:


 It appears that most of the errors are of the form:
collect-ld:
cannot find -lgfortranbegin


I've found that the problem was related to configure deciding to build
fortran and enable runtime tests for it when doing check-gcc even though
libgfortran was not present; I had made my script remove that some time
ago because libgfortran was not supported.

When I tried to configure with libgfortran present and add the make
target all-target-libgfortran, I get after about three hours:

/mnt/scratch/nightly/2006-11-02-softfp/sh-elf/./gcc/xgcc 
-B/mnt/scratch/nightly/2006-11-02-softfp/sh-elf/./gcc/ -nostdinc 
-B/mnt/scratch/nightly/2006-11-02-softfp/sh-elf/sh-multi-elf/newlib/ 
-isystem 
/mnt/scratch/nightly/2006-11-02-softfp/sh-elf/sh-multi-elf/newlib/targ-include 
-isystem /mnt/scratch/nightly/2006-11-02-softfp/srcw/newlib/libc/include 
-B/usr/local/sh-multi-elf/bin/ -B/usr/local/sh-multi-elf/lib/ -isystem 
/usr/local/sh-multi-elf/include -isystem 
/usr/local/sh-multi-elf/sys-include 
-L/mnt/scratch/nightly/2006-11-02-softfp/sh-elf/./ld -DHAVE_CONFIG_H -I. 
-I../../../srcw/libgfortran -I. -iquote../../../srcw/libgfortran/io 
-I../../../srcw/libgfortran/../gcc 
-I../../../srcw/libgfortran/../gcc/config -I../.././gcc -D_GNU_SOURCE 
-std=gnu99 -Wall -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition -Wextra -Wwrite-strings -O2 -g -O2 -c 
../../../srcw/libgfortran/runtime/error.c -o error.o

../../../srcw/libgfortran/runtime/error.c: In function 'show_locus':
../../../srcw/libgfortran/runtime/error.c:288: warning: format '%d' 
expects type

'int', but argument 2 has type 'GFC_INTEGER_4'
../../../srcw/libgfortran/runtime/error.c: At top level:
../../../srcw/libgfortran/runtime/error.c:334: error: 
'_gfortran_runtime_error' aliased to undefined symbol 
'__gfortrani_runtime_error'

make[2]: *** [error.lo] Error 1
make[2]: Leaving directory 
`/mnt/scratch/nightly/2006-11-02-softfp/sh-elf/sh-multi-elf/libgfortran'

make[1]: *** [all] Error 2
make[1]: Leaving directory 
`/mnt/scratch/nightly/2006-11-02-softfp/sh-elf/sh-multi-elf/libgfortran'

make: *** [all-target-libgfortran] Error 2

So, it appears the only way to do a regression test now is to hard code 
with --enable-languages the set of languages that are known to generally 
work, I.e. c,c++ and objc.

Re: Canonical type nodes, or, comptypes considered harmful

2006-11-09 Thread Joern RENNECKE

> I can dig out actual real live numbers, if you're curious. For 
example, when calling comptypes, the no answers are (were) 34x more 
likely than yes answers. If you cannot return false immediately when 
point_to_type1 != pointer_to_type2, you then have to run a structural 
equality tester, and once you do that, you spend 120ns per depth in the 
tree as you fault everything into cache, what's that 300 some 
instructions. 21,980 were fast, 336,523 were slow, the slow path dominated.


I think in order to handle the C type system with the non-transitive
type compatibility effectively, for each type we have to pre-compute
the most general variant, even if that has no direct representative in
the current program.
I.e. for an array, point to the corresponding incomplete array.
(Fortunately, C allows only one dimension to be incomplete.)
For a pointer to a struct, point to the type where the struct type is
incomplete.
If an array appears in a context of another type where an incomplete
array is not allowed, we can use the complete array for computing the
most general variant of that other type.

Types can only be compatible if their most general variants are equal.

In addition to this most generalized type, each complete type can also
have a pointer to a representative of its equivalence class, and be
flagged as complete; two complete types are compatible iff they are the
same.

If a type is not in the same equivalence class as its most general variant,
it needs to describe all the 'optional' bits, i.e. struct types pointed to,
array dimensions, cv-qualifiers.  


I'm not sure if this is better done by having all the semantics there
(that can be a win if there are lots of places where cv-qualifiers could
be added without breaking type compatibility, but not many cv-qualifiers
are actually encountered),
or if it should only contain a bare data field for each item (e.g. an
integer for an array dimension), with the most general variant having
a checklist of how to compare them, and its description of the overall
type saying what the data actually means when it comes to operating on the
type.

Re: Canonical type nodes, or, comptypes considered harmful

2006-11-10 Thread Joern RENNECKE


Mike Stump wrote:

 

Now, what are the benefits and weaknesses between mine and your, you  
don't have to carry around type_context the way mine would, that's a  
big win.  You don't have to do anything special move a reference to a  
type around, that's a big win.  You have to do a structural walk if  
there are any bits that are used for type equality.


No, these bits can be placed together - a structural walk is only 
necessary when (some of) these bits themselves need more scrutiny - i.e. 
on at least one of the
sides some of the constituent parts is partially incomplete.  And I 
can't see how you can avoid that complexity.


  In my scheme, I  don't have to.  I just have a vector of items, they 
are right next to  each other, in the same cache line.


Again, the equality of the items might not not trivial.

libgfortran still fails to build for sh-elf

2006-11-13 Thread Joern RENNECKE

/home/amylaar/bld/2006-11-10/sh-elf-multi/./gcc/xgcc 
-B/home/amylaar/bld/2006-11-10/sh-elf-multi/./gcc/ -nostdinc 
-B/home/amylaar/bld/2006-11-10/sh-elf-multi/sh-multi-elf/newlib/ 
-isystem 
/home/amylaar/bld/2006-11-10/sh-elf-multi/sh-multi-elf/newlib/targ-include 
-isystem /home/amylaar/bld/2006-11-10/srcw/newlib/libc/include 
-B/usr/local/sh-multi-elf/bin/ -B/usr/local/sh-multi-elf/lib/ -isystem 
/usr/local/sh-multi-elf/include -isystem 
/usr/local/sh-multi-elf/sys-include 
-L/home/amylaar/bld/2006-11-10/sh-elf-multi/./ld -DHAVE_CONFIG_H -I. 
-I../../../srcw/libgfortran -I. -iquote../../../srcw/libgfortran/io 
-I../../../srcw/libgfortran/../gcc 
-I../../../srcw/libgfortran/../gcc/config -I../.././gcc -D_GNU_SOURCE 
-std=gnu99 -Wall -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition -Wextra -Wwrite-strings -O2 -g -O2 -c 
../../../srcw/libgfortran/runtime/error.c -o error.o

../../../srcw/libgfortran/runtime/error.c: In function 'show_locus':
../../../srcw/libgfortran/runtime/error.c:288: warning: format '%d' 
expects type 'int', but argument 2 has type 'GFC_INTEGER_4'

../../../srcw/libgfortran/runtime/error.c: At top level:
../../../srcw/libgfortran/runtime/error.c:334: error: 
'_gfortran_runtime_error' aliased to undefined symbol 
'__gfortrani_runtime_error'

make[2]: *** [error.lo] Error 1

Re: libgfortran still fails to build for sh-elf

2006-11-15 Thread Joern RENNECKE


François-Xavier Coudert wrote:


I suggest that you test the following patch and report back to us:



I got the patch wrong (it's not a real printf function we have there):

Index: libgfortran/runtime/error.c
===
--- libgfortran/runtime/error.c (revision 118806)
+++ libgfortran/runtime/error.c (working copy)
@@ -285,7 +285,7 @@
  if (!options.locus || cmp == NULL || cmp->filename == NULL)
return;

-  st_printf ("At line %d of file %s\n", cmp->line, cmp->filename);
+  st_printf ("At line %d of file %s\n", (int) cmp->line, cmp->filename);
}


That still leaves the undefined symbol error:

/home/amylaar/bld/2006-11-10/sh-multi-elf-f/./gcc/xgcc 
-B/home/amylaar/bld/2006-11-10/sh-multi-elf-f/./gcc/ -nostdinc 
-B/home/amylaar/bld/2006-11-10/sh-multi-elf-f/sh-multi-elf/newlib/ 
-isystem 
/home/amylaar/bld/2006-11-10/sh-multi-elf-f/sh-multi-elf/newlib/targ-include 
-isystem /home/amylaar/bld/2006-11-10/srcw/newlib/libc/include 
-B/usr/local/sh-multi-elf/bin/ -B/usr/local/sh-multi-elf/lib/ -isystem 
/usr/local/sh-multi-elf/include -isystem 
/usr/local/sh-multi-elf/sys-include 
-L/home/amylaar/bld/2006-11-10/sh-multi-elf-f/./ld -DHAVE_CONFIG_H -I. 
-I../../../srcw/libgfortran -I. -iquote../../../srcw/libgfortran/io 
-I../../../srcw/libgfortran/../gcc 
-I../../../srcw/libgfortran/../gcc/config -I../.././gcc -D_GNU_SOURCE 
-std=gnu99 -Wall -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition -Wextra -Wwrite-strings -O2 -g -O2 -c 
../../../srcw/libgfortran/runtime/error.c -o error.o
../../../srcw/libgfortran/runtime/error.c:334: error: 
'_gfortran_runtime_error' aliased to undefined symbol 
'__gfortrani_runtime_error'

make[2]: *** [error.lo] Error 1
make[2]: Leaving directory 
`/home/amylaar/bld/2006-11-10/sh-multi-elf-f/sh-multi-elf/libgfortran'

Re: 32 bit jump instruction.

2006-12-13 Thread Joern Rennecke

In http://gcc.gnu.org/ml/gcc/2006-12/msg00328.html, you wrote:

>> On 06 Dec 2006 23:13:35 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
>> If you can't afford to lose a register, then I think your only option
>> is to pick some callee-saved register and have each branch instruction
>> explicitly clobber it.  Then it will be available for use in a long
>> branch, and it will be available for use within a basic block.  This
>> is far from ideal, but I don't know a better way to handle it within
>> gcc's current framework.

> Can i get more clarity on this part. Is it implemented in any other backends?

> When you say "pick some callee-saved register ", is it to pick them
> randomly from an available set in CALL_USED_REGISTERS or a specific
> register.

The SH does register scavenging, and sharing of far branches.  Look at
config/sh/sh.c:split_branches .  Also see PR 29336 for how this could be better
integrated with machine-specific constant pool placement.

However, because the SH has delayed branches, there is always a guaranteed way
to find a register - one can be saved, and then be restored in the delay slot.
An architecture without delay slots would have to have another fallback
mechanism, e.g. inserting a register restore before the target - possibly with
a short jump around it, duplicate instructions from the target till a register
dies, or inserting a register restore and jump in the vincinity of the target.

Re: RFC: vectorizer cost model

2007-02-20 Thread Joern Rennecke

> As a first step, to stay on conservative side, it makes sense
> consider the scalar cost of smaller block while calculating scalar cost.
> Note, smaller block may not exist.

I think that this should be considwered quite common.

We should base the weights of the costs of the two blocks on branch
probability and predictability.
However, one tm interface we are currently missing is one to describe
the branch cost as dependent on the branch probability.
If a branch is taken most of the time, it should be well predictable, so
its cost should be low on targets that have cheap predictable branches.

Re: RS6000 call pattern clobbers

2007-02-26 Thread Joern Rennecke

> > Do you remember why you wrote the call patterns this way?  Was
> > there a problem with reload and clobbers of hard registers in a register
> > class containing a single register or some other historical quirk?
> 
> I think the former.  I no longer remember the details, but if you had
> a clobber of a hard reg, there were a number of things that such a hard
> reg couldn't be used for (this is where the details are murky) and in order
> to avoid that problem a match_scratch was used to delay the explicit hard
> register usage as long as possible.

Actually, this depended on SMALL_REGISTER_CLASSES.  If the target was not
marked has having SMALL_REGISTER_CLASSES, any explicitly used hard register
would not be used for register and/or spill allocations anywhere in the
function.

Re: new auto-inc-dec pass

2007-03-06 Thread Joern Rennecke

In http://gcc.gnu.org/ml/gcc/2007-03/msg00128.html, you wrote:
> One case is about multiple increments, the tree optimizer merges them and
> increments the register only once, so if one only looks at the size of the
> pointer value one misses them, e.g. something like this:
>
>   (set (mem (reg)) (x))
>   (set (mem (plus (reg) (const_int 4))) (x))
>   (set (reg) (plus (reg) (const_int 8))

The patches attached to PR20211 handle this case.

RFC: integer division by multiply with invariant reciprocal

2007-03-25 Thread Joern Rennecke

The strategy that the SHMEDIA port uses to do cse and loop invariant code
motion for division by invariant reciprocal can in principle be used by
any processor that has a reasonable fast instructions for count leading
sign bits (can be substituted with count leading zero bits), widening
or highpart multiply, and dynamic shifts, and that typically allows
to add a few more pseudo registers without much cost (i.e. 32 GPRS,
or efficient stack access like on x86).

Basically, the division is split into an invariant computation
which takes the divisor as input, and computes a denormalization shift
count (SHIFT), an approximate reciprocal factor (INV1), and a reciprocal
adjustment factor (INV2).
The division is then performed by multiplying the INV2 with the dividend,
shifting it right by a constant amount, subtracting that from (or adding
to) the product of INV2 with the dividend, denormalize the result, and
do an adjustment to take account oif different rounding for signed /
unsigned results.
The details of how to best compute SHIFT, INV1 and INV2, if INV2
has the same or opposite signe of INV2, and the constant shift count
depend on the target instruction set and microarchitecture.

However, not all processors are as good as the SH5 in computing the
invariant values compared to how fast they can do a straight division;
thus instead of chopping the division into multiple RTL pieces and then
allowing any kind of CSE and PRE to rip them apart, we would like some
finer control that requires a minimum number of divisions with the same
divisor before this optimization is applied.
Also, exposing this at the tree level will enable other optimizations to
work better, e.g. they can make better unrolling decisions, and take
advantage of the non-trapping nature of the division by multiply with
reciprocal.

The question is now how best to respresent the invariant operations as
trees.
We could have a special tree code for this, but then
- It would eat up a tree code.
- Producing three results at once, it is likely to be trouble for
  ssa transformations.

Having a single built-in function for this purpose would avoid the tree
code cost, but the we'd be faced with a three-valued built-in function.

So I think that the easiest way to integrate this with the rest of the
compiler is to have a target hook that emits trees to compute SHIFT, INV1
and INV2.  These might use additional temporaries, and could use standard
arithmetic and/or memory operations, machine-specific builtin functions, or
a mixture of both; but any one tree expression would compute only one
value.
Emitting the computation of the actual division using SHIFT, INV1, INV2 is
then done by a second target hook.
The parameter for the minimum number of divisions with the same divisor
to trigger this optimization should be separate from the one for floating
point.

Re: core changes for mep port

2007-03-29 Thread Joern Rennecke

In http://gcc.gnu.org/ml/gcc/2007-03/msg01007.html, Steven Bosscher wrote:
> All of this feels (to me anyway) like adding a lot of code to the
> middle end to support MEP specific arch features.  I understand it is
> in the mission statement that more ports is a goal for GCC, but I
> wonder if this set of changes is worth the maintenance burden...

The ARC also has an optional SIMD coprocessor, for which it would be useful
to be able to specify that specific oprtations should be done on the
coprocessor.

Moreover, our current register class preferencing and mode tying heuristics
are somewhat weak, and next-to-useless in the first scheduling pass; if
we'll have specific modes for computations on the coprocessor, I think it
will be easier for the compiler to automatically make sure that
computations that feed into or depend on other computations on the
coprocessor are also done on the coprocessor.

This could even help more traditional processors.
I remember for SHmedia, the floating point registers can hold integer values,
but when you tell the compiler the facts straight as in cost of
moves between various register classes, you'll end up with lots of moves
of integer values between integer and floating point registers.  You
actually have to fiddle the cost to pretend that movsi_media alternatives
involving only floating point registers are more costly than they actually
are to avoid this pessimization.
I suppose a similar scenario might be true for integer operations in
floating point registers on x86 with MMX and its successors.
Using separate modes for integer computations in floating point registers
on these processors could help gcc to model the cost of transferring values
between integer and floating point units.
It also can make TRULY_NOOP_CONVERSION more relevant, as the answer is
often different between integer and floating point registers.

Re: Hot and Cold Partitioning (Was: GCC 4.1 Projects)

2005-02-28 Thread Joern RENNECKE

Dale Johannesen wrote:
   
Well, no, what is supposed to happen (I haven't tried it for a while, 
so I don't promise
this still works) is code like this:

.hotsection:
loop:
  conditional branch (i?==1000) to L2
L1:
  /* do stuff */
end loop:
/* still in hot section  */
L2:  jmp L3
.coldsection:
L3:
  i = 0;
  jmp L1
Well, even then, using of the cold section can increase the hot section 
size, depending on target, and for some
targets the maximum supported distance of the cold section.

For SH, using the cold section, you get (for non-PIC):
L2: mov.l 0f,rn
   jmp @rn
   nop
   .balign 4
0:  .long L3
   .coldsection:
L3: mov.l 0f,rn
   jmp @rn
   mov #0,rn
   .balign 4
0:  .long L1
I.e. 10 to 12 bytes each in in hot and cold sections.
Without the cold section, you need only 4 bytes:
L2: bra L1
   mov #0,rn
Note also, that in order to avoid the condjump-around-jump syndrome, L2 has
to be within about +-256 bytes of the condjump.
Should I do custom basic block reordering in machine_dependent_reorg to 
clean up
the turds of hot and cold partitioning?

Re: Hot and Cold Partitioning (Was: GCC 4.1 Projects)

2005-02-28 Thread Joern RENNECKE

Dale Johannesen wrote:
   
Certainly.  In general it will make the total size bigger, as does 
inlining.  If you have good
information about what's hot and cold, it should reduce the number of 
pages that actually
get swapped in.  The information has to be good, though, as a branch from
hot<->cold section becomes more expensive.  I'd recommend it only if 
you have
profiling data (this is a known winner on Spec in that situation).

Should I do custom basic block reordering in machine_dependent_reorg 
to clean up
the turds of hot and cold partitioning?

No, you should not turn on partitioning in situations where code size 
is important to you.

You are missing the point.  In my example, with perfect profiling data, 
you still end up with
more code in the hot section, i.e. more pages are actually swapped in.  
A block should not
be put in the code section unless it is larger than a jump into the cold 
section.

Re: Hot and Cold Partitioning (Was: GCC 4.1 Projects)

2005-02-28 Thread Joern RENNECKE

Dale Johannesen wrote:
   

No, you should not turn on partitioning in situations where code 
size is important to you.

You are missing the point.  In my example, with perfect profiling 
data, you still end up with
more code in the hot section,

Yes.
i.e. more pages are actually swapped in.

Unless the cross-section branch is actually executed, there's no 
reason the unconditional
jumps should get paged in, so this doesn't follow.
If you separate the unconditional jumps from the rest of the function, 
you just have created a
per-function cold section.  Except for corner cases, there would have to 
be a lot of them to
save a page of working set.  And if you have that many, it will mean 
that the condjump can't
reach.  And it is still utterly pointless to put blocks into the 
inter-function cold section
if that only makes the intra-function cold section larger.

So we've come from 4 bytes, on cycle:
bf 0f
mov #0,rn
over 6 bytes, BR issue slot during one cycle:
bt L2
L1:
..
L2:
bra L1
mov #0,n
to 10 bytes in hot part of the hot section, 12 bytes in cold part of the hot
section, and another 10 to 12 bytes in the cold section, while the execution
time in the hot path is now two cycles (if we manage to get a good
schedule, we might execute two other instructions in these cycles, but 
still,
this is no better than we started out with):

.hotsection:
bf L2
mov.w 0f,rn
braf @rn
nop
0: .word L2-0b
L1:
...
L2:
mov.l 0f,rn
jmp @rn
nop
.balign 4
0: .long L3
.coldsection
L3:
mov.l 0f,rn
jmp @rn
mov #0,rn
.balign 4
0: .long L1

Re: Questions about trampolines

2005-03-16 Thread Joern RENNECKE

>> Any alternatives that would work for Harvard Architecture devices 
such as the AVR would be welcome.

There are no alternatives that do not have an overhead in the case where
pointers to nested functions are *not* used, which seems unacceptable in
C. You could introduce some kind of pragma for a special kind of pointer
I suppose, but it seems the feature is so little used in C that this would
be overkill.
That is not true.  You can allocate the trampolines from a pool of 
ready-made trampolines,
i.e. poieces of code in code space that each use two pointers in data space 
which hold the
actual function pointer and the static chain.  These can be provided in a 
separate module of
the static libgcc, together with allocation and deallocation of individual 
trampolines from
the pool (the latter has to be called from the epilogue of functions that use 
initialize
(and thus allocate) trampolines).  If your pool is small, you might run out of 
trampolines
before you run out of stack, but for all intents and purposes you have a stack 
overflow then.
You can have special libraries to be linked before libgcc which provide a 
larger pool of trampolines
for programs that need that.
On embedded targets, a large pool of trampolines can be provided with 
incomplete address decoding for
a rom that provides a single trampoline code part, using the code address to 
index into the pointer data.
The data might be dynamically allocated. For targets with MMU, A single page of 
trampoline code parts
can be mapped multiple times as required.

Re: Questions about trampolines

2005-03-16 Thread Joern RENNECKE

Clifford Wolf wrote:
Hi,
On Wed, Mar 16, 2005 at 01:50:32PM +, Joern RENNECKE wrote:
 

These can be provided in a separate module of the static libgcc, together
with allocation and deallocation of individual trampolines from the pool
(the latter has to be called from the epilogue of functions that use
initialize (and thus allocate) trampolines).
   

I also thought about that already.
It's a good idea, but not good enough.  :-(
what's about longjmp(), gotos from nested functions to their surrounding
functions (with a complex calltree inbetween) and (c++) exceptions?
always preserving frame pointers, unrolling the stack frame by frame and
executing this epilogues might be possible - but I don't think that we
really want to go this way..
In a single-threaded environment, you treat the trampoline pool as a 
stack, and
when you allocate a new one, check first if there are any trampolines 
left at the
top that have a static chain pointer that points into deallocated data 
stack.

In (possibly) multithreaded environments, you could treat the 
trampolines as objects
with a destructor.  Exception handling already knows how to call 
destructors while
unwinding the stack.

Re: Questions about trampolines

2005-03-17 Thread Joern RENNECKE

Clifford Wolf wrote:
   

hmm.. what's about doing it gc-like. Instead of a stack there simply is a
'pool' of trampolines from which trampolines are allocated and a pointer to
the trampoline is pushed on the stack.
When the last trampoline from the pool is allocated, a 'garbage collector'
is running over it and looking for pointers to trampolines between the
stack pointer and the stack start address. Every trampoline which isn't
possibly referenced is added to a free-list from which new trampolines are
allocated.
If you have only one procesor stack (i.e. single-threaded execution), 
you can handle
the trampolines as a stack too.  You don't need to deallocate till you 
allocate again,
and then you adjust the trampoline stack so none of its static chain 
pointers points to
a deallocated frame, or to the current frame (since you are only about 
to set up the
trampolines for the current frame then).

If you have multiple processor stacks, you have to register and later 
search them all in
order to make the garbage-collection scheme work.
that it doesn't point at any deallocated frames.

Instead of adding the trampoline pool to libgcc (as suggested earlier in
this thread) I would suggest that gcc generates a trampoline pool in a
linkonce section every time a source file is compiled which requires
trampolines. That way there wouldn't be any trampoline pool in an
executeable which doesn't need one 

You don't need a linkonce section for this.  The function that needs a 
trampoline
calls allocation / deallocation functions, or if it inlines the code, it 
will reference
the pool start addresses - either way, it will reference some symbols.  
By putting the
.o file that provides these symbols along with the code and data parts 
of the trampoline
pool into a static library - libgcc.a or otherwise - you make sure that 
the object is only
linked in when needed.

and a compiler option such as
-ftrampoline-pool-size=32 could be used the specify the size of the
trampoline pool on the command line.
This is messy; say you have two libraries that are compiled with 
-ftrampoline-pool-size=32 ;
they will then share a trampoline pool of 32 entries.  If you compile 
one with
-ftrampoline-pool-size=16 instead, you will have them using different 
pools, or maybe
even get some multiply defined symbols.
It is much saner to make this a link time option.  By selecting a 
specific library for the
trampoline pool, you can adjust the size on a program (or dso, you you 
don't export)
basis, and you might even choose an alternate allocation strategy.  I.e. 
you could have
libgcc provide one with a size that works most of the time and uses 
destructors for
portabiliyt and robustness, have a specialized lightweight one you can 
specifically use for
single-threaded programs, and have a 64 bit linux specific one that ties 
into the threading code
(or is part of a threads package) and mmaps trampoline code pages for 
every processor stack
allocated, sufficiently large and at a fixed offset to the stack so that 
you can put the data part
on the return stack in any suitably aligned position, and have a 
matching trampoline.

I.e. the bare function address and the static chain pointer are 8 bytes 
each, so that a trampoline
data part is 16 bytes.  You require them to be 16-byte aligned on any 
processor stack.
The mmapped trampoline can be an absolute function call to some helper 
code that does the
real work, using the return address to figure out which trampoline is 
executed.  This call should
fit into 16 bytes too, so in the trampoline page to be mmapped , every 
16 bytes there is such an
absolute call insn.  You can get a 1:1 correspondence between 
trampolines and processor stacks
by allocating the stacks all in one specific memory area, and have an 
equally-sized area where
trampolines are mapped.  Thus, you can have differently-sized stacks, 
yet the trampoline code
can add a constant offset to the return address to find the data part of 
the trampoline.

Re: Questions about trampolines

2005-03-17 Thread Joern RENNECKE

Clifford Wolf wrote:
   

Some applications have recursions which go into a depth of 1000 and more.
Some architectures have only a few k ram. Which "a size that works most of
the time" would you suggest?
It's ugly to have a static pool size. But it's intolerable to not allow the
user to change that pool size easily using an option.
 

Of course the user can change the size, by using a library with a 
different size.  But
there should be a sensible default.  The size of that default can vary 
from target to target.

The mmapped trampoline can be an absolute function call to some helper 
code that does the
   

I am pretty sure that all processor architectures with such a strict haward
design that it is impossible to generate dynamic code are MMU-less.
The application of the MMU-based scheme is more to accelerate trampolines by
avoiding cache coherency issues, without making allocation / 
deallocation more expensive.
In fact, since the code is already there, the initialization is cheaper 
than for classic stack-based
trampolines on pure von Neumann architectures.

FWIW, for processor-stack based trampolines. if we could guarantee that 
trampolines are the
only code that can be executed on the stack, we could avoid the 
memory-Icache coherency
issue altogether by allocating entire cache lines for trampolines on the 
stack, and filling them
up with trampolines (at least the code part), with a code part that does 
not change for any
given stack location.  I.e. after writing the code, we'd have to flush 
it to memory, but wouldn't
need to invalidate the Icache, since the only old code that could be 
there would be identical to
the code just written.

Re: Questions about trampolines

2005-03-17 Thread Joern RENNECKE

Robert Dewar wrote:
Joern RENNECKE wrote:
Of course the user can change the size, by using a library with a 
different size. 

This is not an acceptable approach in a production environment,
where switching libraries can force revalidation and retesting. 
This sounds more like a problem with your process than a genuine 
technical problem.
Why should an option that selects a different library be less safe than 
an option that changes
code generation?
But If you really want to, you can of course select a different module 
out of the same library,
by playing with --defsym.

Re: Questions about trampolines

2005-03-18 Thread Joern RENNECKE

Robert Dewar wrote:
Joern RENNECKE wrote:
You need to be able to set the value of a parameter over a widely
varying range, what makes you think you can pick two values that
will cover all cases, or 4 or 6 for that matter. 
It will likely cover most, but not all cases.  With 12 values, you can cover
the range from 64 to 1073741824 pool entries, if you allow for up to 
four times
more entries to be provided than actually desired.
In order to allow to specify the exact size of the pool, you can provide the
source of the library that implements it, and have the application 
programmer
compile it with a -D flag which determines the desired size of the pool.
This can also be driven by a specs file so that all the programmer does is
supply one option with a numerical parameter during 'linking'.

RFC: ms bitfields of aligned basetypes

2005-04-19 Thread Joern RENNECKE

t001_x of the struct-layout test has such beauties as:
typedef _Bool Tal16bool __attribute__((aligned (16)));
struct S49 { Tal16bool a:1; } ;
.  a only gets BIGGEST_ALIGNMENT (i.e. 64 bits), rather than the 128 bits
required for Tal16bool.  Should we enforce that any storage element 
allocated
for a run of ms-bitfields get the full alignment of the basetype, even 
when it exceeds
the size of the basetype and of BIGGEST_ALIGNMENT?

Re: RFC: ms bitfields of aligned basetypes

2005-04-25 Thread Joern RENNECKE

Danny Smith wrote:
Jim Wilson wrote http://gcc.gnu.org/ml/gcc/2005-04/msg01172.html
 

Joern RENNECKE wrote: 
   

required for Tal16bool. Should we enforce that any storage element
 

allocated
 

for a run of ms-bitfields get the full alignment of the basetype,
 

even when it exceeds
 

the size of the basetype and of BIGGEST_ALIGNMENT? 
 

Obviously, we should do the exact same thing that the microsoft
   

compiler
 

does. That is the whole point of -mms-bitfields.
If we can't generate an equivalent testcase for the microsoft
   

compiler,
 

because it doesn't have aligned attributes or equivalent, then we can
   

do
 

whatever seems to make sense.
   

I believe the MS equivalent is __declspec (align (16)).  Could you test
the following patch to 
i386/cygming.h to see if you come closer to MS behaviour (I don't have a
MS compiler handy). 

Sorry, my time is currently taken up by some other work.
The background of my query was this: while doing regression the 
sh-elf-4_1 branch,
I found that some of the struct-layout superficial regressions were 
actually due to use of
unitialized memory in the ms-bitfield code, as reported before in PR 
middle-end/20371.
After applying the patch from December to the branch. I found some new 
regressions,
which led me to write an additional patch, which can be found in the PR 
as attachment
from the 15th April.  This new patch contains an gcc_assert that checks that
actual_align >= type_align whenever storage for a new run of bitfields 
is allocated.
This assert triggered for a number of the struct-layout cases, which 
lead to my query.
I found that a run of bitfields was allocated to only 64 bit alignment, 
even though the
type said it should be 128 bit aligned (although baed on bool).

I don't have a cygwin test platform, I have been doing sh64-elf 
regression tests using the
simulator available in the contrib directory of gdb.

Re: RFC: ms bitfields of aligned basetypes

2005-04-28 Thread Joern RENNECKE

A testcase to trigger the assert was:
typedef _Bool Tal16bool __attribute__ ((aligned (16)));
struct S49
{
 Tal16bool a:1;
};
and it turns out that the underlying problem is actually in the 
general-purpose
field layout code.  Both known_align and actual_align are calculated as
BIGGEST_ALIGNMENT if the offset of the field is zero.  However, the
correct alignment in this case is the alignment of the record, which may be
smaller or larger than BIGGEST_ALIGNMENT, depending on the
alignment of the fields seen so far.

Another ms-bitfield question...

2005-04-28 Thread Joern RENNECKE

t002.x has this code:
typedef unsigned short int Tal16ushort __attribute__ ((aligned (16)));
struct S460
{
 unsigned long int __attribute__ ((packed)) a;
Tal16ushort __attribute__ ((aligned)) b:13) - 1) & 15) + 1);
 unsigned short int c;
};
BIGGEST_ALIGNMENT is 64 for sh64-elf.
Does the ((aligned)) attribute apply to b, to the base type of b, or both
to the base type of b and the base type of the current run of bits?
Currently, I see the record is 128-bit aligned, but the run of bits that b
is allocated from is only 64 bit aligned; this doesn't make any sense.

ppc-eabisim is broken in mainline

2005-05-10 Thread Joern RENNECKE

Between 20050505 and 20050510, the ppc-eabisim configuration was broken.
I'm seeing this error:
/mnt/scratch/nightly/2005-05-10-orv/ppc/./gcc/xgcc 
-B/mnt/scratch/nightly/2005-05-10-orv/ppc/./gcc/ -nostdinc 
-B/mnt/scratch/nightly/2005-05-10-orv/ppc/powerpc-eabisim/newlib/ 
-isystem 
/mnt/scratch/nightly/2005-05-10-orv/ppc/powerpc-eabisim/newlib/targ-include 
-isystem /mnt/scratch/nightly/2005-05-10-orv/srcw/newlib/libc/include 
-B/usr/local/powerpc-eabisim/bin/ -B/usr/local/powerpc-eabisim/lib/ 
-isystem /usr/local/powerpc-eabisim/include -isystem 
/usr/local/powerpc-eabisim/sys-include 
-L/mnt/scratch/nightly/2005-05-10-orv/ppc/./ld -O2  -DIN_GCC 
-DCROSS_COMPILE   -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -Wold-style-definition  -isystem ./include   -g  
-DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED  -I. -I -I../../srcw/gcc 
-I../../srcw/gcc/ -I../../srcw/gcc/../include 
-I../../srcw/gcc/../libcpp/include  -mrelocatable-lib -mno-eabi 
-mstrict-align -xassembler-with-cpp -c eabi.S -o libgcc/./eabi.o
/mnt/scratch/nightly/2005-05-10-orv/ppc/gcc//: Assembler messages:
/mnt/scratch/nightly/2005-05-10-orv/ppc/gcc//:1: Warning: line numbers 
must be positive; line number 0 rejected
eabi.S:136: Error: Relocation cannot be done when using -mrelocatable
eabi.S:137: Error: Relocation cannot be done when using -mrelocatable
eabi.S:138: Error: Relocation cannot be done when using -mrelocatable
eabi.S:139: Error: Relocation cannot be done when using -mrelocatable
eabi.S:142: Error: Relocation cannot be done when using -mrelocatable
make[2]: *** [libgcc/./eabi.o] Error 1
make[2]: Leaving directory `/mnt/scratch/nightly/2005-05-10-orv/ppc/gcc'
make[1]: *** [stmp-multilib] Error 2
make[1]: Leaving directory `/mnt/scratch/nightly/2005-05-10-orv/ppc/gcc'
make: *** [all-gcc] Error 2
build failed

Some experimentation shows that the assembler from the two dates behaves 
the same,
and eabi.S is identical, but the preprocessed source files differ.  cc1 
used to define
_RELOCATABLE, but it does not any more.
The definition used to be:

# 0 ""
#define _RELOCATABLE 1

Re: ppc-eabisim is broken in mainline

2005-05-10 Thread Joern RENNECKE

Andrew Pinski wrote:
On May 10, 2005, at 1:54 PM, Joern RENNECKE wrote:
Between 20050505 and 20050510, the ppc-eabisim configuration was broken.

/mnt/scratch/nightly/2005-05-10-orv/ppc/gcc//:1: Warning: line 
numbers must be positive; line number 0 rejected

That is PR 21250 and this has been failing since April 19.
That warning does indeed also appear in the build from 20050505, but it 
does not stop the build.  The
'Relocation cannot be done when using -mrelocatable' error does.

mainline boostrap comparison failure on i686-pc-linux-gnu with gcc 3.2.3 20030502 (Red Hat Linux 3.2.3-49)

2005-05-11 Thread Joern RENNECKE

I'm getting these errors:
Bootstrap comparison failure!
./bt-load.o differs
./expmed.o differs
build/gengtype-lex.o differs
I've picked bt-load.o for a closer look because it was the smallest
of the affected files.  I've found that the register allocation order
for branch_target_load_optimize differs.
Stage 1 cc1, run from the shell, has:
;; 113 regs to allocate: 99 195 194 147 119 120 101 140 192 225 121 660 
106 10
7 110 64 124 659 82 100 144 145 109 224 244 69 65 236 105 180 68 104 204 
658 115
89 243 128 661 67 169 63 232 85 112 135 134 108 178 219 233 662 182 103 
111 113
190 152 117 102 172 155 206 98 657 93 235 91 274 95 156 83 139 167 188 
153 131
171 209 90 141 159 161 114 87 250 132 208 71 88 168 298 66 287 191 210 
226 118 2
52 (2) 248 138 189 154 158 231 196 299 62 187 193 253 251 310

and stage 2 has:
;; 113 regs to allocate: 99 195 194 147 119 120 101 140 192 225 121 660 
106 10
7 110 64 124 659 82 100 144 145 109 224 244 69 65 236 105 180 68 104 204 
658 115
89 243 128 661 67 169 63 232 85 112 135 134 108 178 219 233 662 182 103 
111 113
190 152 117 102 172 155 206 98 657 93 235 90 274 95 156 83 139 167 188 
153 131
91 171 209 141 159 161 114 87 250 132 208 71 88 168 298 66 287 191 210 
226 118 2
52 (2) 248 138 189 154 158 231 196 299 62 187 193 253 251 310

Note the difference of the position of registers 90 and 91.  wdiff puts 
it like this:

[--;;-]{++;;+} 113 regs to allocate: 99 195 194 147 119 120 101 140 192 
225 121 660 106 107
110 64 124 659 82 100 144 145 109 224 244 69 65 236 105 180 68 104 204 
658 115
89 243 128 661 67 169 63 232 85 112 135 134 108 178 219 233 662 182 103 
111 113
190 152 117 102 172 155 206 98 657 93 235 [-90-] {+91+} 274 95 156 83 
139 167 188 153 131 [-91-] 171 209 {+90+} 141 159 161 114 87 250 132 208 
71 88 168 298 66 287 191 210 226 118 252 (2) 248 138 189 154 158 231 196 
299 62 187 193 253 251 310

When I run the stage1 cc1 under gdb control, with the same input 
arguments, its
register allocation order changes to match the stage2 cc1.  The debug 
information in
the stage 2 cc1 is not very useful, but still it can be demonstrated 
that it stays with
the same allocation order:
(gdb) break global.c:586
Breakpoint 3 at 0x84627aa: file ../../srcw/gcc/global.c, line 586.
(gdb) cond 3 input_location.line == 1464
(gdb) run
...
Breakpoint 3, global_alloc (file=0x8d75e20) at ../../srcw/gcc/global.c:586
586   if (file)
(gdb) p (( (int *) allocno_order))[219]
$1 = 21
(gdb) p (( (int *) allocno_order))[220]
$2 = 158
(gdb) p (( (int *) allocno_order))[221]
$3 = 25
(gdb) p (( (int *) allocno_order))[222]
$4 = 77
(gdb) p ((int*)allocno)[21*17]
$5 = 90
(gdb) p ((int*)allocno)[158*17]
$6 = 274
(gdb) p ((int*)allocno)[25*17]
$7 = 95
(gdb) p ((int*)allocno)[77*17]
$8 = 156

So, I suppose I have to move to a different gcc version for the 
bootstrap compiler.

Re: mainline boostrap comparison failure on i686-pc-linux-gnu with gcc 3.2.3 20030502 (Red Hat Linux 3.2.3-49)

2005-05-12 Thread Joern RENNECKE

Andrew Pinski wrote:
	
   

Actually it is easy to peak at any of them and you will see that the
tree optimizators (lim to be in fact) has changed something somewhere.
 

The trouble is that I'm running the tests on Red hat Enterprise Linux, 
and even
with the address randomization allegedly turned off, most addresses 
still end
up being random.  So I've looked at differences of dump file sizes 
instead, and the
first was in the greg dumps.  Still, experimentation with 3.4.3 supports 
your
statement that the mainline code is to blame: I also get bootstrap 
comparison
failures with 3.4.3 as the bootstrap compiler, in fact two different sets
using two different mainline snapshots:

Bootstrap comparison failure!
./expmed.o differs
build/genattrtab.o differs
build/gengtype-lex.o differs
make[1]: *** [gnucompare] Error 1
and
Bootstrap comparison failure!
./emit-rtl.o differs
./expmed.o differs
build/genattrtab.o differs
make[1]: *** [gnucompare] Error 1

Re: ppc-eabisim is broken in mainline

2005-05-12 Thread Joern RENNECKE

Aldy Hernandez wrote:

* config/rs6000/sysv4.opt (mlittle): Handle.
* config/rs6000/rs6000.c (rs6000_handle_option): Set
target_flags_explicit when appropriate.
 

Thanks, this allowed my build to complete.  It's regression testing now.

Re: Mainline broken

2005-05-13 Thread Joern RENNECKE

Steven Bosscher wrote:
Seems like your forgot the basic-block.h bits in this commit:
http://gcc.gnu.org/ml/gcc-cvs/2005-05/msg00621.html
Gr.
Steven
 

Sorry.  Checked in now.

sh-elf tree-ssa failure

2005-05-13 Thread Joern RENNECKE

execute/20031215-1.c passes on i686 and ppc, but fails on sh-elf - both 
SH1 big endian and SH4 little endian, eight times each.
It still shows the same failure with mainline from 17:00 UTC today.

Executing on host: /mnt/scratch/nightly/2005-05-11/sh-elf/gcc/xgcc 
-B/mnt/scratch/nightly/2005-05-11/sh-elf/gcc/ 
/mnt/scratch/nightly/2005-05-11/srcw/gcc/testsuite/gcc.c-torture/execute/20031215-1.c  
-w  -O2  -DSTACK_SIZE=16384 -fno-show-column  -isystem 
/mnt/scratch/nightly/2005-05-11/sh-elf/sh-elf/./newlib/targ-include 
-isystem /mnt/scratch/nightly/2005-05-11/srcw/newlib/libc/include  
-L/mnt/scratch/nightly/2005-05-11/sh-elf/ld 
-B/mnt/scratch/nightly/2005-05-11/sh-elf/sh-elf/./newlib/ 
-L/mnt/scratch/nightly/2005-05-11/sh-elf/sh-elf/./newlib  -lm 
-Wl,--defsym,_stack=0xff000  -o 
/mnt/scratch/nightly/2005-05-11/sh-elf/gcc/testsuite/20031215-1.x2
(timeout = 300)
/mnt/scratch/nightly/2005-05-11/srcw/gcc/testsuite/gcc.c-torture/execute/20031215-1.c: 
In function 'test1':^M
/mnt/scratch/nightly/2005-05-11/srcw/gcc/testsuite/gcc.c-torture/execute/20031215-1.c:11: 
error: Statement makes a memory store, but has no V_MAY_DEFS nor 
V_MUST_DEFS^M
#   VUSE ;^M
ao.ch[2] = 0;^M
/mnt/scratch/nightly/2005-05-11/srcw/gcc/testsuite/gcc.c-torture/execute/20031215-1.c:11: 
internal compiler error: verify_ssa failed.^M

Re: sh-elf tree-ssa failure

2005-05-13 Thread Joern RENNECKE

Andrew Pinski wrote:
   
Huh, it fails on all targets.

-- Pinski
Sorry, my bad.  It passes at -O0 and -O1 on all targets, and fails at 
-Os etc on all of them, too.
I incorrectly asumed that eight errors per multilib must mean it fails 
across the board.

RFD: what to do about stale REG_EQUAL notes in dead_or_predictable

2005-05-26 Thread Joern RENNECKE


I wonder what best to do about rtl-optimization/21767.

We sometimes have REG_EQUAL notes that are only true when
the instruction stays exactly where it is, like:

(insn 11 10 12 0 (set (reg:SI 147 t)
   (eq:SI (reg/v:SI 159 [ i ])
   (reg:SI 161))) 1 {cmpeqsi_t} (nil)
   (expr_list:REG_EQUAL (eq:SI (reg/v:SI 159 [ i ])
   (const_int 2345678 [0x23cace]))
   (nil)))

(jump_insn 12 11 37 0 (set (pc)
   (if_then_else (eq (reg:SI 147 t)
   (const_int 0 [0x0]))
   (label_ref 17)
   (pc))) 201 {branch_false} (nil)
   (expr_list:REG_BR_PROB (const_int 7100 [0x1bbc])
   (nil)))
;; End of basic block 0, registers live:
(nil)

;; Start of basic block 1, registers live: (nil)
(note 37 12 14 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 14 37 15 1 (set (reg/v:SI 160 [ r ])
   (reg/v:SI 159 [ i ])) 168 {movsi_ie} (nil)
   (expr_list:REG_EQUAL (const_int 2345678 [0x23cace])
   (nil)))

if-conversion changes this to

(insn 11 10 14 0 (set (reg:SI 147 t)
   (eq:SI (reg/v:SI 159 [ i ])
   (reg:SI 161))) 1 {cmpeqsi_t} (nil)
   (expr_list:REG_EQUAL (eq:SI (reg/v:SI 159 [ i ])
   (const_int 2345678 [0x23cace]))
   (nil)))

(insn 14 11 12 0 (set (reg/v:SI 160 [ r ])
   (reg/v:SI 159 [ i ])) 168 {movsi_ie} (nil)
   (expr_list:REG_EQUAL (const_int 2345678 [0x23cace])
   (nil)))

(jump_insn 12 14 38 0 (set (pc)
   (if_then_else (ne (reg:SI 147 t)
   (const_int 0 [0x0]))
   (label_ref:SI 21)
   (pc))) 200 {branch_true} (nil)
   (expr_list:REG_BR_PROB (const_int 2900 [0xb54])
   (nil)))
;; End of basic block 0, registers live:
(nil)

so the REG_EQUAL note on insn 14 is no longer true.
In general, if a REG_EQUAL note remains valid is not computable.
(any REG_EQUAL note is trivially valid if its insn is unreachable.)
Even where it is computable, you'd probably need as much complexity
and target-dependent knowledge to prove it as if you were computing
the equality from scratch.

So, I think our main options are to remove all REG_EQUAL notes
of insns that are moved above a branch, or to change the value to reflect
the condition it depends on.  I.e. we could have an UNKNOWN rtx
to describe an unknown value in a REG_EQUAL note (A note
with a bare UNKNOWN value would be meaningless and should be
removed), and then express the note for insn 14 as:

(expr_list:REG_EQUAL (if_then_else (ne (reg:SI 147 t) (const_int 0 [0x0]))
   
(const_int 2345678 [0x23cace]) (unknwon))

(nil))

RFA: Fix PR21767 (Was: Re: RFD: what to do about stale REG_EQUAL notes in dead_or_predictable)

2005-05-31 Thread Joern RENNECKE


I've tried removing REG_EQUAL notes altogether unless we
know that the source of the move is function invariant, and got
identical assembler for all the EEMBC tests as without the patch.
Likewise for an entire sh4-elf multilibbed libgcc, libstdc++-v3
and newlib build.   I think it is therefore reasonable to assume
that removing the notes doesn't cause any relevant performance
loss.  For a 3.4 based compiler, that was just a small ifcvt.c
patch; for 4.x, I also had to reinstate function_invariant_p  as
a global function.

I can't do an i686-pc-linux-gnu bootstrap at the moment because
the bootstrap fails building libjava, both with unpatched sources
from 12:00 and 16:00 UTC.  I'll do a bootstrap / regression test
of the patch when we are back in bootstrap land.

FWIW, this is the failure:
./../.././gcc/gcjh -classpath '' -bootclasspath . 
java/lang/AbstractMethodError

make[2]: *** [java/lang/AbstractMethodError.h] Segmentation fault
make[2]: *** Deleting file `java/lang/AbstractMethodError.h'
make[2]: Leaving directory 
`/mnt/scratch/nightly/2005-05-31/i686/i686-pc-linux-gnu/libjava'


AFAICS, jcf-io.c:format_uint is miscompiled, base is not handed as
second parameter to umoddi, it uses the saved values of esi/edi
instead.
  
2005-05-27  J"orn Rennecke <[EMAIL PROTECTED]>

* rtl.h (function_invariant_p): Re-add declaration.
* reload1.c (function_invariant_p): No longer static.
* ifcvt.c (dead_or_predicable): Remove REG_EQUAL notes that
might have become invalid.

Index: rtl.h
===
RCS file: /cvs/gcc/gcc/gcc/rtl.h,v
retrieving revision 1.550
diff -p -r1.550 rtl.h
*** rtl.h   19 May 2005 10:38:38 -  1.550
--- rtl.h   31 May 2005 18:11:39 -
*** extern void dbr_schedule (rtx, FILE *);
*** 2062,2067 
--- 2062,2070 
  extern void dump_local_alloc (FILE *);
  extern int local_alloc (void);
  
+ /* In reload1.c */
+ extern int function_invariant_p (rtx);
+ 
  /* In reg-stack.c */
  extern bool reg_to_stack (FILE *);
  
Index: reload1.c
===
RCS file: /cvs/gcc/gcc/gcc/reload1.c,v
retrieving revision 1.471
diff -p -r1.471 reload1.c
*** reload1.c   26 May 2005 05:44:38 -  1.471
--- reload1.c   31 May 2005 18:11:39 -
*** static int reload_reg_free_for_value_p (
*** 405,411 
rtx, rtx, int, int);
  static int free_for_value_p (int, enum machine_mode, int, enum reload_type,
 rtx, rtx, int, int);
- static int function_invariant_p (rtx);
  static int reload_reg_reaches_end_p (unsigned int, int, enum reload_type);
  static int allocate_reload_reg (struct insn_chain *, int, int);
  static int conflicts_with_override (rtx);
--- 405,410 
*** free_for_value_p (int regno, enum machin
*** 4984,4990 
 pic_offset_table_rtx is not, and we must not spill these things to
 memory.  */
  
! static int
  function_invariant_p (rtx x)
  {
if (CONSTANT_P (x))
--- 4983,4989 
 pic_offset_table_rtx is not, and we must not spill these things to
 memory.  */
  
! int
  function_invariant_p (rtx x)
  {
if (CONSTANT_P (x))
Index: ifcvt.c
===
RCS file: /cvs/gcc/gcc/gcc/ifcvt.c,v
retrieving revision 1.189
diff -p -r1.189 ifcvt.c
*** ifcvt.c 29 May 2005 18:56:42 -  1.189
--- ifcvt.c 31 May 2005 18:11:40 -
*** dead_or_predicable (basic_block test_bb,
*** 3430,3441 
--- 3430,3460 
/* Move the insns out of MERGE_BB to before the branch.  */
if (head != NULL)
  {
+   rtx insn;
+ 
if (end == BB_END (merge_bb))
BB_END (merge_bb) = PREV_INSN (head);
  
if (squeeze_notes (&head, &end))
return TRUE;
  
+   /* PR 21767: When moving insns above a conditional branch, REG_EQUAL
+notes might become invalid.  */
+   insn = head;
+   do
+   {
+ rtx note, set;
+ 
+ if (! INSN_P (insn))
+   continue;
+ note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
+ if (! note)
+   continue;
+ set = single_set (insn);
+ if (!set || !function_invariant_p (SET_SRC (set)))
+   remove_note (insn, note);
+   } while (insn != end && (insn = NEXT_INSN (insn)));
+ 
reorder_insns (head, end, PREV_INSN (earliest));
  }

Re: Problem with Delayed Branch Scheduling

2005-07-04 Thread Joern RENNECKE


So you have a few instructions bundled into a VLIW instruction, and
one of the instructions in the bundle is moved into the delay slot,
thus breaking your VLIW bundle.  Right?



That is a much harder problem...  I don't think it is really possible
with the existing dbr scheduling pass, but maybe someone else knows a
trick for this...


So the problem is that we represent instructions that don't actually
exist as individual instructions?  I think it is legitimate to use
machine_dependent_reorg to make the actual instructions explicit.

However, in order to do this without exploding the machine description,
you'd probvably have to revive match_insn (the one formerly named
match_insn2).

Re: does the instruction combiner regards (foo & 0xff) as a special case?

2005-08-01 Thread Joern RENNECKE


But I found they fails to match

if(foo & 0xff) and if(foo & 0x)


These get simplified to foo.


Look at the debugging dump before the combine pass to see what you
need to match.


It doesn't work that way.  What you get from there are only the insn numbers.

Then you run cc1 (or whatever languiage-specific compiler you use) under
gdb control, with a breakpoint on the point in try_combine - in this case,
before the first recog_for_combine call - with a condition to match the
insn numbers.  E.g. for breakpoint 5, to match any combination that ends
in insn 42,
you say:

cond 5 i3->u.fld[0].rt_int == 42

to match a combination of insns 40, 41 and 42 (and only in exactly that
order):

cond 5 
i3->u.fld[0].rt_int == 42 && i2->u.fld[0].rt_int == 41 && i1 && i1->u.fld[0].rt_int == 40


or for an older codebase:

i3->fld[0].rtint == 42 && i2->fld[0].rtint == 41 && i1 && i1->fld[0].rtint == 40

Re: 206 GCC HEAD regressions, 196 new, with your patch on 2005-08-23T19:50:19Z.

2005-08-26 Thread Joern RENNECKE


Joern Rennecke wrote:

 

I've started a make check-target-libjava yesterday, in the hope that 
this would give

me a handle on things, but it's still not finished after 22 hours.


It's still not finished, but in the meantime I modifed the code to abort 
in the case where

the old code would reduce the alignment for !STRICT_ALIGNMENT targets.
This turned up something rather ugly indeed:

java uses char_type_node for its character type, which is 16 bits.
gcc/java/decl.c:747 java_init_decl_processing:
 TYPE_PRECISION (char_type_node) = 16;

On the other hand, tree.c uses char_type_node as the type of the 
smallest addressable

unit:

tree.c:489 make_node_stat
   case tcc_type:
 TYPE_UID (t) = next_type_uid++;
 TYPE_ALIGN (t) = char_type_node ? TYPE_ALIGN (char_type_node) : 0;

java type layout (Was: Re: 206 GCC HEAD regressions, 196 new, with your patch on 2005-08-23T19:50:19Z.)

2005-08-26 Thread Joern RENNECKE


Joern RENNECKE wrote:

 
java uses char_type_node for its character type, which is 16 bits.

gcc/java/decl.c:747 java_init_decl_processing:
 TYPE_PRECISION (char_type_node) = 16;

On the other hand, tree.c uses char_type_node as the type of the 
smallest addressable

unit:

tree.c:489 make_node_stat
   case tcc_type:
 TYPE_UID (t) = next_type_uid++;
 TYPE_ALIGN (t) = char_type_node ? TYPE_ALIGN (char_type_node) : 0;




Could someone with a bit more java experience than be verify how we lay out
structs, unions and arrays with a single bool member?  I suspect that 
even without

my patch, they might be 8 bits large, 16 bit aligned

Re: 4.2 Project: "@file" support

2005-08-30 Thread Joern RENNECKE


> applications will just work, but introducing the very serious risk of
> security problems, leading to, say:
>
> gcc: dj:yourpassword:1234:567:DJ: invalid argument
>
> instead of
>
> gcc: @/etc/passwd: invalid argument

If you want to use gcc to read a file, you get a closer likeness
to the data with:

gcc -E -C -x c /etc/passwd

Polluting argument name space seems more worrying.  A long option would 
not create

such ambiguities.

MEM_NONTRAP_P and push/pop alias set (Was: Re: [attn port maintainers] fix 23671)

2005-09-01 Thread Joern RENNECKE


My current thinking is that, with a few exceptions like prologue
and epilogue generation, it should be considered a BUG if a port
uses gen_rtx_MEM.  Almost always one should be using something
from the adjust_address family of routines.


What are the exact semantics of MEM_NOTRAP_P ?  The documentation
does not agree with the source.  reload sets MEM_NOTRAP_P on
registers that are spilled to memory.  However, writing to these
MEMs can trap if we have a stack overflow.

Ports also have to write to stack memory - in the prologue, and
for copies that use secondary memory (the SH doesn't allocate seondary
memory through the reload mechanisms, but it uses push / pop for 64 bit
copies between general purpose and floating point registers).  Thus,
this question is probably relevant for every port maintainer.

Moreover, register pushes/pops (other than for the return address in the
presence of builtin_return_address) in the prologue / epilogue can't alias
other memory accesses, and on ACCUMULATE_OUTGOING_ARGS, neither can
temporary pushes/pops (unless shrinkwrapping gets implemented).

Therefore, I think it would make sense to have a utility function that
generates a MEM with MEM_NONTRAP_P set to the appropriate value, and
a specific alias set.  We already have gen_const_mem for readonly
memory, so maybe that could be gen_pushpop_mem.

Re: MEM_NONTRAP_P and push/pop alias set (Was: Re: [attn port maintainers] fix 23671)

2005-09-02 Thread Joern RENNECKE


Richard Henderson wrote:

 


As a practical short-term concern, rtx_addr_can_trap_p will not return
true for any stack based reference, including push/pop.  So for 4.1, 
nothing need be done.  Longer term, the answer to the "what does notrap
 

Actually, the SHcompact save / restores use neither stack  nor frame 
pointer in
their MEMs.  


a specific alias set.  We already have gen_const_mem for readonly
memory, so maybe that could be gen_pushpop_mem.
   



I agree completely.  This should be done before a lot of target
code gets uglified.
 

Hmm, I've just found we already have a get_frame_alias_set.  So I'm 
working on

a patch to for a new function get_frame_mem, which sets MEM_NOTRAP_P
and sets the alias set to get_frame_alias_set (), and to use this in all 
the places

of the target-independent code where get_frame_alias_set is currently used
(unless the address looks fishy).
There are many more places in the target-specific code, but I can't 
really test all

the targets, so I'll leave patching that code up to the target maintainers.

Re: RFC ping: Make regname use validate_change

2005-09-05 Thread Joern RENNECKE


in http://gcc.gnu.org/ml/gcc-patches/2005-08/msg01489.html
I've described a problem where regrename renamed a general
register creating an insn which does not satisfy the insn
predicate anymore.


I agree that regrename shouldn't make such replacements.
Constraints are important to drive reload, and sometimes you
need an insn predicate on top to verify that the operands of
the insn agree.  (Until recently insn predicates were also
needed to handle rare special cases that could not be expressed
in constraints because we ran out of distinct letters.  )
Creating single-register classes is not a panacea, as it not
only can lead to exponential growth of the number of patterns,
but also changes REGNO_REG_CLASS, makes you rewrite all your
secondary / tertiary reload code, and it marks every single
register as likely spilled, thus pessimizing register allocation.


As a compromise I've suggested to just call the insn predicate.


This will still call all the "TARGET_FOO" predicates.  These
are basically static predicates that won't change once a set
of options have been chosen.  We are only interested in the
insn predicates which vary in value depending on the instruction
under consideration.

I think the best balance between performance impact and sanity of
the description language can be archived by having one of the
gen* programs check if the predicate mentions insn and/or some
of the operands, and/or an attribute that depends on the insn.

On the other hand, the "TARGET_FOO" predicates tend to be cheap.
If we have genattrtab generate a function to tell us if the
insn predicate matters, that function will likely cost more than
evaluating a simple predicate.  A saving seems only plausible if
we use a lookup table instead, i.e. avoid a change of control flow.

So, on balance, I think that calling the insn predicate (if it set)
is probably the best solution.

sh64 support deteriorating

2005-09-09 Thread Joern RENNECKE

I can't justify spending the amount time that it would take to make the 
sh64 port regression free.
The lack of a debugger that works reliably with recent gcc versions has 
led to an increasing

backlog of uninvestigated execution failures.

Re: sh64 support deteriorating

2005-09-12 Thread Joern RENNECKE


Richard Henderson wrote:


On Fri, Sep 09, 2005 at 04:58:50PM +0100, Joern RENNECKE wrote:
 

The lack of a debugger that works reliably with recent gcc versions has 
led to an increasing backlog of uninvestigated execution failures.
   



Do you think it's the debugger or the compiler that's at fault?
 

The debugger crashes when certain (recently pretty much any) debug 
information is enclosed.
Thus, the debugger is at fault, for crashing.  But for all I know, the 
compiler might also be at
fault, for emitting invalid debug information; however it is more likely 
newer debug information

that my vintage gdb can't understand.

The execution failures are most likely gcc bugs, except when prefetching 
unmapped memory

is involved; the simulator had a bug there.

SH patch applied (Was: Re: sh64 support deteriorating)

2005-09-12 Thread Joern RENNECKE


Kaz Kojima wrote:

 


some compile time errors in c/c++ test for sh64-unknown-linux-elf
http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00466.html

3 tests

gcc.c-torture/compile/simd-4.c
gcc.c-torture/execute/20050604-1.c
gcc.dg/torture/pr21817-1.c

fail with the similar ICE:

gcc/gcc/testsuite/gcc.c-torture/compile/simd-4.c: In function 'tempf':
gcc/gcc/testsuite/gcc.c-torture/compile/simd-4.c:15: error: unable to find a 
register to spill in class 'GENERAL_REGS'
gcc/gcc/testsuite/gcc.c-torture/compile/simd-4.c:15: error: this is the insn:
(insn 53 52 54 0 (set (subreg:DI (reg:V4SF 68 fr4 [196]) 0)
   (and:DI (subreg:DI (reg:V4SF 68 fr4 [196]) 0)
   (const_int -4294967296 [0x]))) 85 {anddi3} (nil)
   (nil))
 


Yes, these appeared also in the simulator tests.


It seems odd that the DImode subregs of V4SFmode registers are used
as the operands of logical operations, though I don't understand why
reload complains as above.
 


reload complained because HARD_REGNO_MODE_OK disallowed
V4SFmode in GENERAL_REGS.  Allowing that also causes register
allocation to use GENERAL_REGS in the first place.  An and with a
J16 constraint can also be done with FP_REGS using mov.ls from r63.
A natural way to implement this would use an fr (or rf) constraint in
one of the alternatives.  While looking at this I also found that we were
missing a register class for an fr constraint.  I've tested the attached
patch over the weekend for sh-elf and sh64-elf, and checked it in now.
2005-09-12  J"orn Rennecke <[EMAIL PROTECTED]>

* sh.h (HARD_REGNO_MODE_OK): Allow V4SFmode in general purpose
registers for TARGET_SHMEDIA.
(enum reg_class, REG_CLASS_NAMES, REG_CLASS_CONTENTS): Rename
GENERAL_FP_REGS to GENERAL_DF_REGS.  Add GENERAL_FP_REGS as union
of GENERAL_REGS and FP_REGS.

Index: sh.h
===
RCS file: /cvs/gcc/gcc/gcc/config/sh/sh.h,v
retrieving revision 1.276
diff -p -r1.276 sh.h
*** sh.h6 Aug 2005 13:26:24 -   1.276
--- sh.h12 Sep 2005 13:21:53 -
*** extern char sh_additional_register_names
*** 1152,1158 
|| GENERAL_REGISTER_P (REGNO)) \
 : (MODE) == V4SFmode \
 ? ((FP_REGISTER_P (REGNO) && ((REGNO) - FIRST_FP_REG) % 4 == 0) \
!   || (! TARGET_SHMEDIA && GENERAL_REGISTER_P (REGNO))) \
 : (MODE) == V16SFmode \
 ? (TARGET_SHMEDIA \
? (FP_REGISTER_P (REGNO) && ((REGNO) - FIRST_FP_REG) % 16 == 0) \
--- 1152,1158 
|| GENERAL_REGISTER_P (REGNO)) \
 : (MODE) == V4SFmode \
 ? ((FP_REGISTER_P (REGNO) && ((REGNO) - FIRST_FP_REG) % 4 == 0) \
!   || GENERAL_REGISTER_P (REGNO)) \
 : (MODE) == V16SFmode \
 ? (TARGET_SHMEDIA \
? (FP_REGISTER_P (REGNO) && ((REGNO) - FIRST_FP_REG) % 16 == 0) \
*** enum reg_class
*** 1341,1346 
--- 1341,1347 
DF_REGS,
FPSCR_REGS,
GENERAL_FP_REGS,
+   GENERAL_DF_REGS,
TARGET_REGS,
ALL_REGS,
LIM_REG_CLASSES
*** enum reg_class
*** 1365,1370 
--- 1366,1372 
"DF_REGS",  \
"FPSCR_REGS",   \
"GENERAL_FP_REGS",  \
+   "GENERAL_DF_REGS",  \
"TARGET_REGS",  \
"ALL_REGS", \
  }
*** enum reg_class
*** 1402,1408 
  /* FPSCR_REGS:  */\
{ 0x, 0x, 0x, 0x, 0x0080 }, \
  /* GENERAL_FP_REGS:  */   
\
!   { 0x, 0x, 0x, 0x, 0x0102ff00 }, \
  /* TARGET_REGS:  */   \
{ 0x, 0x, 0x, 0x, 0x00ff }, \
  /* ALL_REGS:  */  \
--- 1404,1412 
  /* FPSCR_REGS:  */\
{ 0x, 0x, 0x, 0x, 0x0080 }, \
  /* GENERAL_FP_REGS:  */   
\
!   { 0x, 0x, 0x, 0x, 0x0302 }, \
! /* GENERAL_DF_REGS:  */   
\
!   { 0x, 0x, 0x, 0x, 0x0302ff00 }, \
  /* TARGET_REGS:  */   \
{ 0x, 0x, 0x, 0x, 0x00ff }, \
  /* ALL_REGS:  */  \

Re: Retested: RFA: fix PR middle-end/23290

2005-09-12 Thread Joern RENNECKE


Thanks for the review.

Richard Henderson wrote:


Though I'll state again for the record that any ABI that bases
its decisions on modes instead of tree codes is broken.
 

The specific mode that was tested against was BLKmode.  If we want to 
make ports

impervious to random use of BLKmode, we should declare the practice of
FUNCTION_ARG yielding a REG rtx as obsolete, i.e. everything but a plain 
stack

argument has to be expressed with a PARALLEL.

At the moment, tm.texi still states that a value not passes on the stack 
is usually

expressed with a REG rtx, and these can't handle BLKmode.

reg_used_between_p vs. CLOBBER

2005-09-20 Thread Joern RENNECKE


reg_used_between_p checks the CALL_INSN_FUNCTION_USAGE of
a CALL_INSN for CLOBBERS.  I think it shouldn't; we are interested
in uses, not sets/ clobbers.

reload_in conflicts (Was: Re: [patch RFC] SH: PR target/21623)

2005-09-27 Thread Joern RENNECKE


Kaz Kojima wrote:


Joern RENNECKE <[EMAIL PROTECTED]> wrote:
 

Sorry, I forgot that this is specified to depend on fpscr, and that we 
are running
optimize_mode_switching before reload now.  This makes this solution 
unusable

for TARGET_FMOVD.  This means we need to go the secondary / tertiary reload
route.  (At least for TARGET_FMOVD - we can use a special constraint to mean
FP_REGS if !TARGET_FMOVD and NO_REGS if TARGET_FMOVD.)
   



I'll try it, though the secondary/tertiary reload stuff may be beyond
me.
 

Hmm, I see we have a conflict on reload_insi.  There can only be one 
reload pattern
per direction and mode.  And push_secondary_reload only generates 
tertiary reloads

on behalf of reload patterns.
I suppose we could have push_secondary_reload set t_class in the
(icode == CODE_FOR_nothing) case.  That would make it simple to write
ports where multiple temporary registers are needed.
On the other hand, we'd have to make sure that we get the actual reload 
emitting right,
since we now created a new case - and there'd still be the problem of 
possible

clashes where you genuinely need two conflicting reload patterns .
An approach that soves these issues would be to change push_secondary_reload
to parse multi-alternative reload patterns.  Now that we have 
multi-character constraints,
we can use a special constraint in the place of a predicate.  So 
push_secondary_reload
could first try to find a match that satisfies not only the 
operand[in_p] constraint for
the to-be-reloaded operand, but also has an operand[!in_p] constraint 
(insn_class)
matching the reload_class, and an operand[2] constraint matching the 
secondary
reload class.  If that fails, it can choose the first match where the 
operand[in_p] constraint

matches the operand and the operand[!in_p] constraint matches reload_class.
For backward compatibility, we can treat a single-alternative reload pattern
like we do now, i.e. consider it matching if the operand predicate 
matches, no

matter what the constraints say.

Re: [URGENT] GCC 4.0 Nomination

2005-10-04 Thread Joern RENNECKE


Joe Buck wrote at http://gcc.gnu.org/ml/gcc/2005-10/msg00075.html :

> My suggestion: anyone who is listed in the MAINTAINERS file, and who can
> make it to the dinner, could volunteer to accept the award.  If more than
> one want to go, and the dinner hosts are willing, you can all go up on
> stage together, like they do at the Oscars.  If more want to go than
> can be accomodated, then we have something to work out, but even in that
> case the people involved can discuss it among themselves and pick a
> representative.

I could make it there, but I'd have to leave shortly after 11 p.m., 
since the last

train from paddington to bristol goes at half past eleven.

Re: Question on Dwarf2 unwind info and optimized code

2005-10-25 Thread Joern RENNECKE


In http://gcc.gnu.org/ml/gcc/2005-10/msg00823.html, Jim Wilson wrote:

> The frame info is primarily used for C++ EH stack unwinding. Since 
you can't throw a C++ exception in an epilogue, epilogue frame info 
isn't needed > for this, and was never implemented for most targets. 
Which is a shame.


It can't be easily implemented in target-specific code alone.  Sometimes 
there is code after the epilogue, so there would have to be

a mechanism to get the dwarf virtual machine back to the pre-epilogue state.

svn diff branch woprking copy against mainline?

2005-11-02 Thread Joern RENNECKE

How do I diff a modified working copy of a branch against a specific 
version of the mainline?
This operation is essential when sanity-checking merges from mainline to 
a branch of files that

have changed more in mainline than in the branch.

With cvs, that was as easy as saying:

bash-2.05b$ cvs diff -r sh-elf-4_1-merge-20050913 Makefile.in

With svn, specifying a revision alone won't help, since that will diff 
against the version of the

file in the branch in the specified revision of the whole tree.

I tried:
bash-2.05b$ svn diff Makefile.in 
svn+ssh://[EMAIL PROTECTED]/svn/gcc/trunk/gcc/[EMAIL PROTECTED]


But that gives me an error message:

svn: Target lists to diff may not contain both working copy paths and URLs

Re: svn diff branch woprking copy against mainline?

2005-11-02 Thread Joern RENNECKE


Mike Stump wrote:

 svn diff --old svn+ssh://gcc.gnu.org/svn/gcc/tags/gcc_4_0_1_release/ 
gcc/file.c --new file.c


Thanks, --old / --new does the trick.  However, I must say the error 
message is rather misleading.


 
svn needs to go on a long command line diet, it is seriously no fun  
to repeat things, over, and over.  In cvs, I used the equivalent of  
svn+ssh://gcc.gnu.org/svn/gcc once a year or so, with svn, it is just  
annoying.


Agreed.  The problem here actually seems to be with the design (or 
non-design) of branches
and tags in svn.  IIUC, svn has no concept that all the files under 
trunk are really the mainline, and
that the trees under branches/xxx bear any relation to trunk other than 
that all the files and directories
happen to be related by a copy operation.  In order to have a shorthand 
to redirect from one branch
to another (or branch to mainline or vice versa), svn first would have 
to have a concept of where
the mainline and the branch roots are in the first place, and what to 
call them.  Likewise for tags.
Considering different possible repository layouts and languages for the 
trunk/branches designators
in the actual full tree, that seems require a configuration file (or a 
section in a larger configuration

file).
Maybe something like:

#(nick)name-regexp location
trunk trunk
mainline   trunk
\(.*branchpoint.*\)tags/\1
\(.*branch.*\)branches/\1
\(.*\)  tags/\1

with the further constraint that a match occurs only if the associated 
location exists.


E.g. if you say -rgcc_4_1_1_release, and tags/gcc_4_1_1_release exists, 
you'd
refer to files in that subtree rather than the ones your current working 
copy is based on.


Another problem with command line length extension is specifying the 
number of lines
of context.  To specify nine lines of context used to be a simple -9, 
then the POSIX police
came and required us to use -U9, (without asking if we wanted to be 
posixly corrected or

posixed any harder) and with svn this is finally expanded to -x -U9.

Re: SH: PR target/24445

2005-11-02 Thread Joern RENNECKE


Kaz Kojima wrote:



[.expand after the patch]
(set (reg/f:SI 160) (const:SI (unspec [(symbol_ref:SI ("baz"))] 7)))
(set (reg:SI 161) (plus:SI (reg:SI 12 r12) (reg/f:SI 160)))
(set (reg/f:SI 159) (mem/u/c:SI (reg:SI 161)))
(set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("bar")
(set (mem/c/i:SI (reg/f:SI 159)) (reg:SI 0 r0)))
 

The last insn is invalid.  Before reload, a return value in a 
CLASS_LIKELY_SPILLED
hard reg cannot be used in arbitrary instructions, but has to be copied 
to a plain register

first.

Re: svn diff branch woprking copy against mainline?

2005-11-02 Thread Joern RENNECKE


Daniel Berlin wrote:

 

svn needs to go on a long command line diet, 
   



True.
However, it *does* need some way to differentiate between url->url,
url->wc, and wc->url commands, so even if there was an SVNROOT, you'd
still have to specify it  on the command lines :)
 

I can't follow that reasoning.  If you have one or more full svn url(s), 
that is evident.
If you say [EMAIL PROTECTED] and no file with that exact name (i.e. including 
the @) exists,

a specific version of file in the repository is wanted.
If a file argument is specified, it is natural to get the repository 
location from the
svn info on that file.  If no file argument is specified,  but the 
current working directory
has been checked out with svn, it is natural to use the repository 
location of that directory.

Re: svn diff branch woprking copy against mainline?

2005-11-02 Thread Joern RENNECKE


Daniel Berlin wrote:


Of course, the question always raised when you try to do this is "why is
this better than just using shell variables"

if you can give me a good answer to take back to [EMAIL PROTECTED], i'm
happy to

 

shell variable scenario: gcc.gnu.org has a special file that people can 
check out to get settings
for the several hundred tags & branches, which they have to massage with 
a sed script to
generate a file that is sources by .profile / .bash_profile to agree 
with the access methods

required to get throuh the local firewall, so something that looks like:
export 
gcc_4_1_1_release='svn+ssh://[EMAIL PROTECTED]/svn/gcc/tags/gcc_4_1_1_release'


now that I have expanded my environment by hundreds of variables, I can do:
svn diff Makefile.in  $gcc_4_1_1_release/gcc/Makefile.in

But I'd rather do:
svn diff Makefile.in -rgcc_4_1_1_release

I.e. no silly naming of the path from the trunk root to the file, and no 
repetition of the file name itself.

Re: svn diff branch woprking copy against mainline?

2005-11-03 Thread Joern RENNECKE


Branko Čibej wrote:

 



It certainly seems that --old and --new are redundant.


I suggest a search in http://svn.haxx.se/dev/.


What should we search for?  I tried both of --old and --new, and both 
searches

came up empty.




Also, you could
consider stealing some ideas from Perforce, where the command would be
something like

p4 diff [EMAIL PROTECTED] file.c

and the RCS figures out how to map the label to the repository version.
Basically, the # and @ characters are special; # is used to introduce
a revision number (the global revision number), and a number of things
can follow @, like a label, or a date.
  


This seems to be a common misconception. The important thing to 
remember here is that there is no separate namespace for labels and 
branches in SVN, and that the layout of the repository is arbitrary. 
IOW, the fact that you have branches in /branches is a convention, not 
something imposed by the SVN server.


It's not a misconception, it's a perception of an svn shortcoming.  
There should be a configurable mapping from branch/tag names to branch/tag
locations.  I.e. you tell the svn server once what your conventions are, 
and then you don't have to apply them by hand every time you refer to
a branch or tag.  Without such a mechanism, svn makes a rather poor cvs 
replacement.




With the above in mind, your p4 example would translate to something 
like this:


   svn diff [EMAIL PROTECTED]/gcc_4_0_1_release file.c


This syntax is bad not only because of the need to mention branches/ but 
also because you need to name file.c twice.
And how would you expand it to diff several files against a different 
branch/tag?




(not that this would work, for reasons discussed to death in the 
[EMAIL PROTECTED] archives).


url(s), please.

[translating branch->location]


  It can't, because it doesn't know that trunk is special.


Yes, that's one of the things that we have to tell it in some config 
file.  I think it's ok if we consider
'trunk' aka 'mainline' a branch/tag at the location trunk/ .  I.e. to 
diff a branch working copy
against the current mainline, you could say -rtrunk; to diff it against 
yesterdays mainline,

-rtrunk -Dyesterday or [EMAIL PROTECTED] .

(There is a remaining disadvantage here against cvs because we can't any 
longer use a single
version number to refer to a particular version in a particular 
branch.  However, that only
really applies to single files; for multiple files, in cvs you'd have 
to use dates or symbolic

tags to refer to a particular state of mainline/a branch ).




Now, as of not too long ago we can teach the svn client to expand the 
repository root; your example would become (assuming your working copy 
is on trunk, and assuming % expands to the repos root):


   svn diff --old %/branches/gcc_4_0_1_release/somedir/file.c --new 
file.c


I suppose that's a bit better, although I admit it's not ideal.



However, before coming up with a zillion suggestions about how to make 
the syntax nicer, please do consider the idea that we did put a lot of 
thought into the diff syntax,


Actually, it's not only diff.  Being able to refer to tags/branches by 
name is also important for other operations like log, annotate (aka 
blame), update,

merge...
I get the impression that a lot of work has gone into designing the 
underlying repository, and that is certainly fundamental to having
a powerful version control system, but the usability of the command-line 
interface is still way behind cvs when you want to seriously

work with branches.

and that covering all the uses and edge cases and is not easy. I'll be 
the first to admit that the current syntax sucks, but it works -- as 
opposed to most proposed (and many once implemented, now defunct...) 
forms that usually break down in the most trivial cases.


Telling the SVN devs to "change the diff syntax like /this/" is a bit 
like telling the GCC devs to "just add this extension to g++". We all 
know what the response to /that/ usually is. :)


It's a bit different here because we had something that worked for us 
before - cvs.  The main lure of switching to svn has been the promise of
better performance when doing operations on branches, and keeping 
history across copy / rename operations.  If working on branches turns
out to be actually harder with svn because it has no clue where the 
branches are, we might be better off with cvs after all.


More to the point, if I understood Daniel Berlin correctly, he offered 
to do some work on subversion to make it fit the requirements of the gcc
project.  But before any such work with regards to naming tags/branches 
can be done, there should be a consensus from the gcc developers about
what it is we want, and buy-in from the svn developers that the design 
would be acceptable for svn.


Thus, it is important that we have this discussion about design first.

In general, a repository could have more than one project, and 
branches/tags for one project need to apply to ano

timezone of svn server for -r?

2005-11-03 Thread Joern RENNECKE


What timezone does the svn server use when I specify time & date with -r?
With cvs that was never an issue because I appended UTC to the time, but svn
rejects that, so it seems I have to convert the time into whatever 
timezone the server

happens to use.

diffing directories with merged-as-deleted files?

2005-11-03 Thread Joern RENNECKE


I've a working copy of a branch in which I merged changes from mainline

- including the
deletion of the .cvsignore files.  When I try to diff a diretory, it 
errors out on the

.cvsignore files.  Is there an option not to diff files that don't exist?

bash-2.05b$ svn diff --old 
svn+ssh://[EMAIL PROTECTED]/svn/gcc/trunk/gcc --new gcc

Index: gcc/Makefile.in
===
/usr/bin/diff -up -F'^(' -u -L gcc/Makefile.in  
(.../svn+ssh://[EMAIL PROTECTED]/svn/gcc/trunk/gcc)   (revision 
106440) -L gcc/Makefile.in(.../gcc)   (working copy) 
gcc/.svn/tmp/text-base/Makefile.in.svn-base gcc/Makefile.in
--- gcc/Makefile.in 
(.../svn+ssh://[EMAIL PROTECTED]/svn/gcc/trunk/gcc)   (revision 106440)

+++ gcc/Makefile.in (.../gcc)   (working copy)
@@ -2400,7 +2400,8 @@ alias.o : alias.c $(CONFIG_H) $(SYSTEM_H
regmove.o : regmove.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) 
$(RTL_H) \

  insn-config.h timevar.h tree-pass.h \
  $(RECOG_H) output.h $(REGS_H) hard-reg-set.h $(FLAGS_H) function.h \
-   $(EXPR_H) $(BASIC_BLOCK_H) toplev.h $(TM_P_H) except.h reload.h
+   $(EXPR_H) $(BASIC_BLOCK_H) toplev.h $(TM_P_H) except.h reload.h \
+   $(OPTABS_H) gt-regmove.h
ddg.o : ddg.c $(DDG_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) \
  toplev.h $(RTL_H) $(TM_P_H) $(REGS_H) function.h \
  $(FLAGS_H) insn-config.h $(INSN_ATTR_H) except.h $(RECOG_H) \
@@ -2755,7 +2756,7 @@ GTFILES = $(srcdir)/input.h $(srcdir)/co
 $(srcdir)/tree-iterator.c $(srcdir)/gimplify.c \
 $(srcdir)/tree-chrec.h $(srcdir)/tree-vect-generic.c \
 $(srcdir)/tree-ssa-operands.h $(srcdir)/tree-ssa-operands.c \
-  $(srcdir)/tree-profile.c $(srcdir)/tree-nested.c \
+  $(srcdir)/tree-profile.c $(srcdir)/tree-nested.c $(srcdir)/regmove.c \
 $(srcdir)/ipa-reference.c \
 $(srcdir)/targhooks.c $(out_file) \
 @all_gtfiles@
@@ -2772,7 +2773,7 @@ gt-lists.h gt-alias.h gt-cselib.h gt-gcs
gt-expr.h gt-sdbout.h gt-optabs.h gt-bitmap.h gt-dojump.h \
gt-dwarf2out.h gt-reg-stack.h gt-dwarf2asm.h \
gt-dbxout.h gt-c-common.h gt-c-decl.h gt-c-parser.h \
-gt-c-pragma.h gtype-c.h gt-cfglayout.h \
+gt-c-pragma.h gtype-c.h gt-cfglayout.h gt-regmove.h \
gt-tree-mudflap.h gt-tree-vect-generic.h \
gt-tree-profile.h gt-tree-ssa-address.h \
gt-tree-ssanames.h gt-tree-iterator.h gt-gimplify.h \
Index: gcc/.cvsignore
===
/usr/bin/diff -up -F'^(' -u -L gcc/.cvsignore   
(.../svn+ssh://[EMAIL PROTECTED]/svn/gcc/trunk/gcc)   (revision 0) -L 
gcc/.cvsignore  (.../gcc)   (revision 106387) gcc/.svn/empty-file 
gcc/.cvsignore

/usr/bin/diff: gcc/.cvsignore: No such file or directory
svn: '/home/afra/users/renneckej/bin/gccdiff' returned 2

Re: diffing directories with merged-as-deleted files?

2005-11-03 Thread Joern RENNECKE


Daniel Jacobowitz wrote:


On Thu, Nov 03, 2005 at 07:15:22PM +, Joern RENNECKE wrote:
 


Index: gcc/.cvsignore
===
/usr/bin/diff -up -F'^(' -u -L gcc/.cvsignore   
(.../svn+ssh://[EMAIL PROTECTED]/svn/gcc/trunk/gcc)   (revision 0) -L 
gcc/.cvsignore  (.../gcc)   (revision 106387) gcc/.svn/empty-file 
gcc/.cvsignore

/usr/bin/diff: gcc/.cvsignore: No such file or directory
svn: '/home/afra/users/renneckej/bin/gccdiff' returned 2
   



Presumably this is a bug in your 'gccdiff' script?
 


Should it return success for diffing stuff that does not exist?

bash-2.05b$ cat ~/bin/gccdiff
#!/bin/bash
diff=/usr/bin/diff
args="-up -F'^('"

echo ${diff} ${args} "$@"
exec ${diff} ${args} "$@"

Re: diffing directories with merged-as-deleted files?

2005-11-03 Thread Joern RENNECKE


Daniel Jacobowitz wrote:

 


Whatever you want.  It should probably either return success, or use -N.
 

I also get a failure when I comment out the diff-cmd line in my 
~/.subversion/config .
Does that mean that every subversion configuration that doesn't 
configure a diff-cmd to deal

with non-existant files is broken?

bash-2.05b$ svn diff --old 
svn+ssh://[EMAIL PROTECTED]/svn/gcc/trunk/gcc --new gcc

Index: gcc/Makefile.in
===
--- gcc/Makefile.in 
(.../svn+ssh://[EMAIL PROTECTED]/svn/gcc/trunk/gcc)   (revision 106446)

+++ gcc/Makefile.in (.../gcc)   (working copy)
@@ -2400,7 +2400,8 @@
regmove.o : regmove.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) 
$(RTL_H) \

   insn-config.h timevar.h tree-pass.h \
   $(RECOG_H) output.h $(REGS_H) hard-reg-set.h $(FLAGS_H) function.h \
-   $(EXPR_H) $(BASIC_BLOCK_H) toplev.h $(TM_P_H) except.h reload.h
+   $(EXPR_H) $(BASIC_BLOCK_H) toplev.h $(TM_P_H) except.h reload.h \
+   $(OPTABS_H) gt-regmove.h
ddg.o : ddg.c $(DDG_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) \
   toplev.h $(RTL_H) $(TM_P_H) $(REGS_H) function.h \
   $(FLAGS_H) insn-config.h $(INSN_ATTR_H) except.h $(RECOG_H) \
@@ -2755,7 +2756,7 @@
  $(srcdir)/tree-iterator.c $(srcdir)/gimplify.c \
  $(srcdir)/tree-chrec.h $(srcdir)/tree-vect-generic.c \
  $(srcdir)/tree-ssa-operands.h $(srcdir)/tree-ssa-operands.c \
-  $(srcdir)/tree-profile.c $(srcdir)/tree-nested.c \
+  $(srcdir)/tree-profile.c $(srcdir)/tree-nested.c $(srcdir)/regmove.c \
  $(srcdir)/ipa-reference.c \
  $(srcdir)/targhooks.c $(out_file) \
  @all_gtfiles@
@@ -2772,7 +2773,7 @@
gt-expr.h gt-sdbout.h gt-optabs.h gt-bitmap.h gt-dojump.h \
gt-dwarf2out.h gt-reg-stack.h gt-dwarf2asm.h \
gt-dbxout.h gt-c-common.h gt-c-decl.h gt-c-parser.h \
-gt-c-pragma.h gtype-c.h gt-cfglayout.h \
+gt-c-pragma.h gtype-c.h gt-cfglayout.h gt-regmove.h \
gt-tree-mudflap.h gt-tree-vect-generic.h \
gt-tree-profile.h gt-tree-ssa-address.h \
gt-tree-ssanames.h gt-tree-iterator.h gt-gimplify.h \
svn: Can't open file 'gcc/.cvsignore': No such file or directory

Re: diffing directories with merged-as-deleted files?

2005-11-03 Thread Joern RENNECKE


Daniel Jacobowitz wrote:

 


Whatever you want.  It should probably either return success, or use -N.

 

P.S.: When I use a diff-cmd with -N, I not only get a diff for the 44 
files that are different,
but also a header for each of the 752 files that are identical, i.e. two 
lines for each file like:


Index: gcc/tree-ssa-operands.c
===

cvs would never do such nonsense.

svn repository incorrectly converted or corrupted

2005-11-03 Thread Joern RENNECKE

cvs version version 1.1.1.1.2.1 of 
gcc/libjava/classpath/java/awt/im/InputContext.java , i.e. the 
sh-elf-4_1-branch head,

is supposed to correspond to
svn+ssh://[EMAIL PROTECTED]/svn/gcc/branches/sh-elf-4_1-branch/libjava/classpath/java/awt/im/[EMAIL PROTECTED] 
.
However, every single line that was supposed to be kept from the 
previous version has been removed.

Re: svn diff branch woprking copy against mainline?

2005-11-03 Thread Joern RENNECKE


Branko Čibej wrote:


"--old" "--new"


Hmm, that finds a lot more, although not specific to options.
I've found one thread that seems slightly relevant,  Diff syntax changes 
for issue #1093 . 

I get the impression that you devise neat ways to navigate precisely 
through multiple
time dimensions.  By all means, do that, but please don't use up the 
very option letters that

cvs uses to do commonplace repository navigation with.
An ordinary tag is a static thing; you don't expect it to change over 
time.  An ordinary
branch will be present in the head revision.  With a suitable repository 
layout, and some
descriptive config files, the -r option should give capabilities 
comparable to cvs.
I suppose it doesn't matter if numerical arguments to -r are always pure 
operational versions,
but you should be able to name a branch/tag and thus imply a root 
directory for the location,
in the current head revision unless modified with @rev-number (on the 
branch name) or -D.



  As I said in another post, I don't want to repeat past 
[EMAIL PROTECTED] discussions on this list (and yes, you're 
talking about things we've already discussed to death. :)


So why is there no pointer in the FAQ to these posts?

Re: svn repository incorrectly converted or corrupted

2005-11-03 Thread Joern RENNECKE


Daniel Berlin wrote:

 



Simply do a recopy of libjava from the approriate tag, and all will be
well.
 


Do you have a list of potentially affected files?

Re: svn diff branch woprking copy against mainline?

2005-11-03 Thread Joern RENNECKE


Joern Rennecke wrote:

 
but you should be able to name a branch/tag and thus imply a root 
directory for the location,
in the current head revision unless modified with @rev-number (on the 
branch name) or -D.


P.S.: instead of adding a -D option we could also a syntax of -r 
[EMAIL PROTECTED]

Re: diffing directories with merged-as-deleted files?

2005-11-03 Thread Joern Rennecke

> What version of svn?

The 1.3 release candidate.

> What is the exact branch you are trying to diff??

I had checked out a copy of the sh-elf-4_1-branch, and used
svn merge to apply the patches from the last merge point to
the current mainline.  This merge deleted the .cvsignore file
in my working copy, but svn diff still tries to reference it.

Re: diffing directories with merged-as-deleted files?

2005-11-04 Thread Joern RENNECKE


Daniel Berlin wrote:

 


I did

svn co svn+ssh://gcc.gnu.org/svn/gcc/branches/sh-elf-4_1-branch
cd sh-elf-4_1-branch
svn merge -r106276:106279 svn+ssh://gcc.gnu.org/svn/gcc/trunk .
(rev 106276:106279 contains the change that will remove .cvsignore)

[EMAIL PROTECTED]:/mnt/gccstuff/sh-elf-4_1-branch> svn diff -N
 

It's not the diff against the pristine copy that's the problem, but the 
diff against mainline.


I've renamed the toplevel dir to gcc (for compatibility with my 
symlink-building scripts),

then changed current directory to it.  Then I did:

svn diff --old svn+ssh://[EMAIL PROTECTED]/svn/gcc/trunk/gcc --new gcc 
|less

Re: diffing directories with merged-as-deleted files?

2005-11-04 Thread Joern RENNECKE


Daniel Berlin wrote:

 


Uh, but a diff against the pristine copy is the same as a diff against
mainline at that point, since your only differences come from merging
the mainline.
 


No, the pristine copy is the pristine copy of the branch.  I want to diff
my working copy of the branch against the head of trunk.

 


This i can reproduce.
I imagine nobody noticed because, as i've pointed out above, this is a
very roundabout way of doing the same thing regular svn diff will tell
you at that point.
I'm committing a fix now and nominating it for 1.3.x
 


Thanks!

strange result when compiling w/ -fpreprocessed but w/out -fdumpbase

2005-11-11 Thread Joern RENNECKE


When you compile a file that contains a line directive, e.g.:

# 1 "../../libgcc2.c"
int f ()
{
 return 0;
}

using the -fpreprocessed option to cc1, but without -fdumpbase, the base 
filename of the
line number directive us used both for the assembly output file and for 
debugging dumps
from -da.  This can be rather confusing when you can't find neither 
output file nor debugging
dumps in the current directory.  And at the least, the -fpreprocessed 
documentation is wrong
when it states that this option is implicit when the file ends in .i; 
this effect of -fpreprocesed

only appears when the option is actually passed to cc1.

Re: Delay branch scheduling vs. the CFG

2005-11-16 Thread Joern RENNECKE


> > >   4. An entirely new basic block on its own.
> >
> > When can option 4 happen??
> IIRC it occurs when there was only 1 insn in either the target
> or fall-thru block.When it gets sucked into the delay
> slot of a branch, then it is effectively its own basic
> block.

When the fall-through is ended by a code label, and has only one insn, 
and that is eligible for
a delay-slot which can be annulled-true, the fall-through block can end 
up in the delay slot.


When the target block is ended by an unconditional jump, and otherwise 
has only one insn,
which is elegible for a delay slot in preceding branch that can be 
anulled-false, the fall-through

block can end up in the delay slot.
Likwise if the fall-through block consists only of a branch-delay-slot 
eligible insn  and
an unconditional jump, the branch and fall-through block can be 
converted into an inverted

branch with anulled-false delay slot insn.

And finally, sometimes earlier reorg changes have changed the data flow 
so that actually
no anulled slots are required (if there were no changes before, jump 
optimization should
have caught these opportunities, placing the lone insn in front of the 
branch); or the only

data anti-dependency might have the branch condition itself.

Re: Register Allocation

2005-11-24 Thread Joern RENNECKE


In http://gcc.gnu.org/ml/gcc/2005-11/msg01163.html, Ian Lance Taylor wrote:


Either way, register elimination can cause addresses which were valid
to become invalid, typically because valid offsets from the frame
pointer become invalid offsets from the stack pointer.  So that needs
to be cleaned up somewhere.


This is not just about some requiring some cleanup somewhere.  Register
elimination and stack slot allocation determine the exact addresses that
are used, which in turn determine what reload inheritance is possible for
address reloads that are for stack slots which are close together on the
stack.  Getting this right is essential to avoid performance degradation on
some platforms.  These targets typically use LEGITIMIZE_RELOAD_ADDRESS to
split out-of-range addresses into a normal form with a base address load
and a memory access using this base with a small offset.

On the other hand, the hard register spills appear to offer a new
opportunity: we have talked about shrink-wrapping code in the past, but
have never implemented this in gcc.
I think that register saves/restores can be considered
special cases of hard register spills.  In order to do this efficiently,
there would have to be some interface with the target to exploit insn
sequences that can save/restore multiple registers more efficiently in
bulk, .e.g load/store multiple, or auto-increment use on targets that
are otherwise ACCUMULATE_OUTGOING_ARGS.  On the other hand, these
techniques can also help when we need to spill multiple hard registers
around a tight loop.

Re: s390{,x} ABI incompatibility between gcc 4.0 and 4.1

2005-11-29 Thread Joern RENNECKE


Jakub Jelinek wrote:


I have looked just at one failure, but maybe all of them are the same thing.
typedef char __attribute__((vector_size (16))) v16qi;
int i = __alignof__ (v16qi);

with GCC 4.0 sets i to 8 (s390{,x} have BIGGEST_ALIGNMENT 64), but
GCC 4.1 sets i to 16.
The changes that created this binary incompatibility are
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23467
I think.  layout_type sets TYPE_ALIGN to 128 bits (size of v16qi)
and in 4.0 and earlier finalize_type_size used to decrease the size
to GET_MODE_ALIGNMENT (TImode), which is 64 on s390{,x}.

Was this change intentional?  If yes, I think it should be documented in 4.1
release notes, but I still hope it wasn't intentional.
 


No, it wasn't.  The change was supposed to affect structures only.
As I understand the documentation ,the expected behaviour would be to 
limit the alignment
to BIGGEST_ALIGNMENT, unless the user has specified a larger alignment 
with the

aligned attribute.

One possible solution appears to be along the lines of
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/stor-layout.c.diff?cvsroot=gcc&r1=1.239&r2=1.240 
,

but with the comment changed to explain what we want to preserve now.

However, I think it is probably better to start out with the right 
alignment, by fix this code

in layout_type to take BIGGEST_ALIGNMENT into account:
   case VECTOR_TYPE:
...
   /* Always naturally align vectors.  This prevents ABI changes
  depending on whether or not native vector modes are 
supported.  */

   TYPE_ALIGN (type) = tree_low_cst (TYPE_SIZE (type), 0);

Re: s390{,x} ABI incompatibility between gcc 4.0 and 4.1

2005-11-29 Thread Joern RENNECKE


Jakub Jelinek wrote:

 



If we use MIN (tree_low_cst (TYPE_SIZE (type), 0), BIGGEST_ALIGNMENT)
here, I'm afraid that would be much bigger ABI incompatibility.
Currently, say
typedef char __attribute__((vector_size (64))) v64qi;
is 64 bytes aligned on most arches, even when BIGGEST_ALIGNMENT is much
smaller.
GCC 4.0.x on s390{,x} aligned vector_size 1/2/4/8/64/128/... types
to their size, just vector_size 16 and 32 has been 8 bytes aligned
(BIGGEST_ALIGNMENT).
 

That sounds very strange.  Is there a rationale for that, or is this a 
SNAFU?



Not capping to BIGGEST_ALIGNMENT might have issues with some object formats
though, if they don't support ridiculously big aligments.
 


We have MAX_OFILE_ALIGNMENT for that.

Re: s390{,x} ABI incompatibility between gcc 4.0 and 4.1

2005-11-30 Thread Joern RENNECKE


Jakub Jelinek wrote:


On Tue, Nov 29, 2005 at 10:01:25PM +, Joern RENNECKE wrote:
 


If we use MIN (tree_low_cst (TYPE_SIZE (type), 0), BIGGEST_ALIGNMENT)
here, I'm afraid that would be much bigger ABI incompatibility.
Currently, say
typedef char __attribute__((vector_size (64))) v64qi;
is 64 bytes aligned on most arches, even when BIGGEST_ALIGNMENT is much
smaller.
GCC 4.0.x on s390{,x} aligned vector_size 1/2/4/8/64/128/... types
to their size, just vector_size 16 and 32 has been 8 bytes aligned
(BIGGEST_ALIGNMENT).


 

That sounds very strange.  Is there a rationale for that, or is this a 
SNAFU?
   



It is just a side-effect of the 3.4/4.0 code - if there was a supported
integer mode on the target for the requested size, then alignment of that
mode was used (and mode alignments are at most BIGGEST_ALIGNMENT).
If no integer mode was supported for that size, it would use the earlier
alignment, i.e. vector_size.
 


I would call that a bug then.


Unfortunately, while say vector_size (64) etc. vectors are IMHO very unlikely,
vector_size (16) will occurr in user programs from time to time and thus the
ABI incompatibility might affect existing programs.
We can document the incompatibility as a feature (after all, getting
rid of that alignment anomaly on s390{,x} wouldn't be a bad thing I guess).
 

Having types with an alignment larger than BIGGEST_ALIGNMENT in the 
absence of
alignment attributes is also an anomaly.  And as you said, having 
vectors larger than
BIGGEST_ALIGNMENT is not as likely as having ones that are within this 
size, so making
the change to layout_struct to honour BIGGEST_ALIGNMENT for vectors 
should not cause

worse incompatibilities than doing nothing.
Now that the vector types are still rather new, we still have a 
reasonable chance to get this right
without causing too much disruption.  Fixing this at a later date will 
only get harder and harder.
Having excessive alignments for vector types would not only vaste space 
on the stack and in structures,
but also requires run-time stack aligning code and  extra overhead in 
auto-vectorized code,
and is also likely to cause failure to use vector operations at all at 
runtime because the alignment

constraints are not met.


Not capping to BIGGEST_ALIGNMENT might have issues with some object formats
though, if they don't support ridiculously big aligments.
 



 


We have MAX_OFILE_ALIGNMENT for that.
   



But is it used for vector type alignments?
 

It is not used for type alignments, but for variables.  Thus, structs 
that contains such vectors
still can have excessive alignment padding, but as a whole their 
alignment is restricted to

MAX_OFILE_ALIGNMENT.

Re: RELOAD_OTHER bug?

2005-12-13 Thread Joern RENNECKE


Note that 4157 is out of order.  I *think* what's happening is that
the MERGE_TO_OTHER macro isn't taking into account that if you merge
RELOAD_OTHER and RELOAD_FOR_OTHER_ADDRESS, you can't end up with a
RELOAD_OTHER.


No, anything merged with RELOAD_OTHER has to be RELOAD_OTHER.

Re: RELOAD_OTHER bug?

2005-12-13 Thread Joern RENNECKE


DJ Delorie wrote:


No, anything merged with RELOAD_OTHER has to be RELOAD_OTHER.
   



Why?
 


RELOAD_FOR_OTHER_ADDRESS only lives till the RELOAD_OTHER input reloads.


Does this mean that RELOAD_FOR_OTHER_ADDRESS reloads can never be
merged with RELOAD_OTHER reloads?
 

Yes.  But if they load the same value as a RELOAD_OTHER input, they can 
share the same reload register.

Re: mainline bootstrap broken on i686-pc-linux-gnu

2005-12-13 Thread Joern RENNECKE


I see this here too.  Apparently this was caused by the i386.h PUSH_ROUNDING
change of this patch:

2005-12-13  Jakub Jelinek  <[EMAIL PROTECTED]>

   PR debug/25023
   PR target/25293
   * expr.c (emit_move_resolve_push): Handle PRE_MODIFY
   and POST_MODIFY with CONST_INT adjustment equal to PUSH_ROUNDING.
   Fix POST_INC/POST_DEC handling if PUSH_ROUNDING is not identity.
   * config/i386/i386.md (pushhi2, pushqi2): Use pushl instead of 
pushw.

   Set mode to SI, adjust constraints.
   (pushhi2_rex64, pushqi2_rex64): Set mode to DI.
   * config/i386/i386.h (PUSH_ROUNDING): Round up to 4 instead of 2 for
   32-bit code.

The push_operand predicate does not allow pushes with a mode that does not
agree with PUSH_ROUNDING.

Re: mainline bootstrap broken on i686-pc-linux-gnu

2005-12-13 Thread Joern RENNECKE





I can't reproduce it (otherwise I wouldn't have committed it), it
bootstrapped/regtested just fine for me.



Can one of those who can reproduce it give me preprocessed mf-runtime.i
and exact gcc options that triggered it?


I have attached the stripped down testcase.
The bug is triggered with:
./cc1  mf-runtime-i.c -march=i686

extern unsigned char __mf_lc_shift;

float
__mf_adapt_cache ()
{
  return __mf_lc_shift;
}

Re: RELOAD_OTHER bug?

2005-12-13 Thread Joern RENNECKE


DJ Delorie wrote:


Does this mean that RELOAD_FOR_OTHER_ADDRESS reloads can never be
merged with RELOAD_OTHER reloads?

 

Yes.  But if they load the same value as a RELOAD_OTHER input, they can 
share the same reload register.
   



So why does reload specifically check for RELOAD_FOR_OTHER_ADDRESS
when deciding if a merge to RELOAD_OTHER is permitted?  Is this a bug
in the current logic?

  for (j = 0; j < n_reloads; j++)
if (i != j && rld[j].reg_rtx != 0
&& rtx_equal_p (rld[i].reg_rtx, rld[j].reg_rtx)
&& (! conflicting_input
|| rld[j].when_needed == RELOAD_FOR_INPUT_ADDRESS
|| rld[j].when_needed == RELOAD_FOR_OTHER_ADDRESS))
  {
rld[i].when_needed = RELOAD_OTHER;
rld[j].in = 0;
reload_spill_index[j] = -1;
transfer_replacements (i, j);
 

That test checks that the value can actually live in the reload register 
not only during, but also
in-between (if there is such a time) the two reloads.  If there is a 
reload type available that is

suitable for the merged reload is another matter.
I see now that this code is in merge_assigned_reloads, so it might even 
be safe there to set the
reload type to RELOAD_FOR_OTHER_ADDRESS.  You'll have to check if the 
reload type from
that point onward is only needed to determine the time of the reload 
insn (rather than also the lifetime

of the reload register).

Re: [PATCH] Fix bootstrap on i686-pc-linux-gnu

2005-12-13 Thread Joern RENNECKE


Jakub Jelinek wrote:

 


While we could use pushhi2 insn
(would need to use pre_modify rather than pre_dec etc.), it wouldn't
buy us anything.


Presumably, it would prevent a partial register stall.

Re: Huge compile time regressions

2005-12-19 Thread Joern RENNECKE

Daniel Berlin wrote:

On Thu, 2005-12-15 at 00:48 +0100, Steven Bosscher wrote:

Hi,

Someone caused a >10% compile time regression yesterday for CSiBE, see
http://www.csibe.org/draw-diag.php?branchid=mainline&flags=-Os&rel_flag=--none--&dataview=Timeline&finish_button=Finish&draw=sbs&view=1&basephp=l-sbs

Gr.
Steven

This is very very bad.

Joern, i'd imagine this was your patch.

I think this is related to using register liveness information from
flow. My original if-conversion patch
http://gcc.gnu.org/ml/gcc-patches/2004-01/msg03281.html used a simple
linear-time

algorithm to identify registers that are local to each basic block.

I can think of two things that could cause a noticable slowdown:

- comparison of global_live_at_end in struct_equiv_init. Most call
sites call gcc_unreachable when struct_equiv_init fails. If an
additional parameter is passed into struct_equiv_init to tell if the
the comparison may fail, we can optimize the sanity check out for
!ENABLE_CHECKING.
- The call to update_life_info_in_dirty_blocks when one of the compared
blocks is dirty. I'm not sure if we could get away with doing a local
update, and/or starting the update only for the blocks under
consideration. If we don't have to do the global_live_at_end
comparison, we can probably also skip the update if only one of the
blocks is dirty, and use the global_live_at_end from the block that is
not dirty.

A further improvement might be to remove the regset comparison from
struct_equiv_init altogether, and only make sure we use a
global_live_at_end that is right for at least one of the
blocks. The regsets can be compared later when we have matched all the
edges, maybe even the entire block.

We could also go back to makling sure we have consistent data flow
information at the start of the pass, and keep the bits that we need
up-to-date as we go along.

Re: HARD_REGNO_MODE_OK_FOR_CLASS Might Be Nice (tm)

2006-01-03 Thread Joern RENNECKE

In http://gcc.gnu.org/ml/gcc/2005-12/msg00642.html, Bernd Jendrissek wrote:
> Which leads me to the subject.  Would it be a win to have a macro
> HARD_REGNO_MODE_OK_FOR_CLASS (REGNO, MODE, CLASS) which would be the
> authoritative test for this loop in find_reg()?  On my port, and I
> imagine on many others too, I think a default
>
> #ifndef HARD_REGNO_MODE_OK_FOR_CLASS
> #define HARD_REGNO_MODE_OK_FOR_CLASS(REGNO, MODE, CLASS) \
>   HARD_REGNO_MODE_OK ((REGNO), (MODE))
> #endif
>
> would be okay.

It's not that simple.  For example, consider multi-word integer arithmetic.
If you want to allocate a 32 bit integer register on your 16 bit x86,
all the integer registers are suitable as parts of the allocation.
However, if you start with the last integer register, the second part
will end up in a hard register which is not an integer register.
So to make this work, you's have to say that the last integer register is
not suitable for SImode, SFmode or CHImode, the last three ones are not
suitable for DImode, DFmode, CSImode or SCmode etc.

RFD: CSiBE failure: typeof sometimes copies toplevel const qualifiers

2006-01-04 Thread Joern RENNECKE

In order to investigate the CSiBE compilation time regressions observed 
in December when my cross-jumping patches were installed, I set out to 
compare the timings of current mainline with and without the patches 
reinstated. However, unmodified mainline (r109325) fails to compile the 
linux benchmark with an sh-elf targeted compiler:

while [ $((I--)) -gt 0 ] ; do \
/usr/bin/time -a -o signal.o.time -f "%U" 
/mnt/scratch/base-20060104/bin/sh-elf-gcc -c -D__linux__ -D__KERNEL__ 
-DCONFIG_ARCH_S390X -DCONFIG_ARCH_S390 -U__i386__ -U__x86_64__ 
-I/mnt/scratch/CSiBE/base-20060104/linux-2.4.23-pre3-testplatform/include 
-w -Os -fno-strict-aliasing -fno-common -fomit-frame-pointer -pipe 
-fno-strength-reduce -nostdinc -iwithprefix include 
-DKBUILD_BASENAME=signal -c -o signal.o signal.c ; \

done
signal.c: In function ‘do_sigaltstack’:
signal.c:1148: error: assignment of read-only variable ‘__x’
signal.c:1149: error: assignment of read-only variable ‘__x’
signal.c:1150: error: assignment of read-only variable ‘__x’
signal.c: In function ‘do_sigaltstack’:
signal.c:1148: error: assignment of read-only variable ‘__x’
signal.c:1149: error: assignment of read-only variable ‘__x’
signal.c:1150: error: assignment of read-only variable ‘__x’
signal.c: In function ‘do_sigaltstack’:
signal.c:1148: error: assignment of read-only variable ‘__x’
signal.c:1149: error: assignment of read-only variable ‘__x’
signal.c:1150: error: assignment of read-only variable ‘__x’
make[4]: *** [signal.o] Error 1
make[4]: Leaving directory 
`/mnt/scratch/CSiBE/base-20060104/linux-2.4.23-pre3-testplatform/kernel'

make[3]: *** [first_rule] Error 2
make[3]: Leaving directory 
`/mnt/scratch/CSiBE/base-20060104/linux-2.4.23-pre3-testplatform/kernel'

make[2]: *** [_dir_kernel] Error 2
make[2]: Leaving directory 
`/mnt/scratch/CSiBE/base-20060104/linux-2.4.23-pre3-testplatform'

make[1]: *** [time.txt] Error 2
make[1]: Leaving directory 
`/mnt/scratch/CSiBE/base-20060104/linux-2.4.23-pre3-testplatform'

make: *** [linux-2.4.23-pre3-testplatform/result-time.csv] Error 2

The failure occurs when the include/asm/uaccess.h:get_user macro called 
from kernel/signal.c:do_signalstack with a const source.

The testcase can also be condensed to:

typedef struct s
{
void *p;
} stack_t;

void
do_sigaltstack (const stack_t *uss)
{

__typeof__ (*(&uss->p)) x;
x = 0;
}

Apparently, the const-qualification of uss->p carries over to x. When 
the '*' and '&' are removed from the typeof expression, it doesn't.
I can't find anyting in the documentation that explicitly says if we 
should strip const-qualification or not. However, there are a number of 
examples in extend.texi - although not in the typeof entry - that also 
declare temporary varaibles that are assigned to later, and thus would 
be prone to error when a const value is passed in, if const is 
propagated by typeof.


I think typeof should ignore toplevel const qualifiers, and that this 
should be documented. If we consider this a requirement, then we have a 
regression somewhere between

GNU C version 4.0.0 20050126 (experimental) (sh-elf) and
GNU C version 4.1.0 20050922 .

What puzzles me is that we have recent CSiBE results.

Re: RFD: CSiBE failure: typeof sometimes copies toplevel const qualifiers

2006-01-05 Thread Joern RENNECKE


Richard Guenther wrote:

 


This has been reported before and the kernel was fixed.  typeof now
always "returns" the effective type, including CV qualifiers in effect.
 


Huh?  Why would the effective type of

__typeof__ (*(&uss->p)) x;

be different from

__typeof__ ((uss->p)) x;

?

Moreover, if we include all the CV qualifiers,  this example from 
extend.texi

won't work when x ist const:

@smallexample
#define foo(x)  \
 (@{   \
   typeof (x) tmp; \
   if (__builtin_types_compatible_p (typeof (x), long double)) \
 tmp = foo_long_double (tmp);  \
   else if (__builtin_types_compatible_p (typeof (x), double)) \
 tmp = foo_double (tmp);   \
   else if (__builtin_types_compatible_p (typeof (x), float))  \
 tmp = foo_float (tmp);\
   else\
 abort (); \
   tmp;\
 @})
@end smallexample

Re: Huge compile time regressions

2006-01-05 Thread Joern RENNECKE


Joern Rennecke wrote:

I've found that the most striking compilation time increase was for 
flex / parse.c, which is a bison parser.
-Os compilation for i686-pc-linux-gnu X sh-elf --disable-checking went 
from 0.95 to 4.5 seconds.


Optimizing the REG_SET_EQ invocations gave a moderate win, down to 3.5 
seconds.


2006-01-05  J"orn Rennecke <[EMAIL PROTECTED]>

* cfgcleanup.c: Reinstate patches for PR 20070
* struct-equiv.c (struct_equiv_regs_eq_p): New function.
(struct_equiv_init): Only call update_life_info_in_dirty_blocks
in order to initialize regsets if both blocks are dirty.
Make do sanity check of registers bening equal for STRUCT_EQUIV_FINAL.
Add new parameter check_regs_eq.  Changed all callers.
* basic-block.h (struct_equiv_init): Update prototype.

Index: cfgcleanup.c
===
/usr/bin/diff -p -d -F^( -u -L cfgcleanup.c (revision 109329) -L 
cfgcleanup.c   (working copy) .svn/text-base/cfgcleanup.c.svn-base 
cfgcleanup.c
--- cfgcleanup.c(revision 109329)
+++ cfgcleanup.c(working copy)
@@ -60,9 +60,7 @@ Software Foundation, 51 Franklin Street,
 static bool first_pass;
 static bool try_crossjump_to_edge (int, edge, edge);
 static bool try_crossjump_bb (int, basic_block);
-static bool outgoing_edges_match (int, basic_block, basic_block);
-static int flow_find_cross_jump (int, basic_block, basic_block, rtx *, rtx *);
-static bool old_insns_match_p (int, rtx, rtx);
+static bool outgoing_edges_match (int *, struct equiv_info *);
 
 static void merge_blocks_move_predecessor_nojumps (basic_block, basic_block);
 static void merge_blocks_move_successor_nojumps (basic_block, basic_block);
@@ -74,7 +72,6 @@ static bool mark_effect (rtx, bitmap);
 static void notice_new_block (basic_block);
 static void update_forwarder_flag (basic_block);
 static int mentions_nonequal_regs (rtx *, void *);
-static void merge_memattrs (rtx, rtx);
 
 /* Set flags for newly created block.  */
 
@@ -881,319 +878,6 @@ merge_blocks_move (edge e, basic_block b
   return NULL;
 }
 
-
-/* Removes the memory attributes of MEM expression
-   if they are not equal.  */
-
-void
-merge_memattrs (rtx x, rtx y)
-{
-  int i;
-  int j;
-  enum rtx_code code;
-  const char *fmt;
-
-  if (x == y)
-return;
-  if (x == 0 || y == 0)
-return;
-
-  code = GET_CODE (x);
-
-  if (code != GET_CODE (y))
-return;
-
-  if (GET_MODE (x) != GET_MODE (y))
-return;
-
-  if (code == MEM && MEM_ATTRS (x) != MEM_ATTRS (y))
-{
-  if (! MEM_ATTRS (x))
-   MEM_ATTRS (y) = 0;
-  else if (! MEM_ATTRS (y))
-   MEM_ATTRS (x) = 0;
-  else 
-   {
- rtx mem_size;
-
- if (MEM_ALIAS_SET (x) != MEM_ALIAS_SET (y))
-   {
- set_mem_alias_set (x, 0);
- set_mem_alias_set (y, 0);
-   }
- 
- if (! mem_expr_equal_p (MEM_EXPR (x), MEM_EXPR (y)))
-   {
- set_mem_expr (x, 0);
- set_mem_expr (y, 0);
- set_mem_offset (x, 0);
- set_mem_offset (y, 0);
-   }
- else if (MEM_OFFSET (x) != MEM_OFFSET (y))
-   {
- set_mem_offset (x, 0);
- set_mem_offset (y, 0);
-   }
-
- if (!MEM_SIZE (x))
-   mem_size = NULL_RTX;
- else if (!MEM_SIZE (y))
-   mem_size = NULL_RTX;
- else
-   mem_size = GEN_INT (MAX (INTVAL (MEM_SIZE (x)),
-INTVAL (MEM_SIZE (y;
- set_mem_size (x, mem_size);
- set_mem_size (y, mem_size);
-
- set_mem_align (x, MIN (MEM_ALIGN (x), MEM_ALIGN (y)));
- set_mem_align (y, MEM_ALIGN (x));
-   }
-}
-  
-  fmt = GET_RTX_FORMAT (code);
-  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
-{
-  switch (fmt[i])
-   {
-   case 'E':
- /* Two vectors must have the same length.  */
- if (XVECLEN (x, i) != XVECLEN (y, i))
-   return;
-
- for (j = 0; j < XVECLEN (x, i); j++)
-   merge_memattrs (XVECEXP (x, i, j), XVECEXP (y, i, j));
-
- break;
-
-   case 'e':
- merge_memattrs (XEXP (x, i), XEXP (y, i));
-   }
-}
-  return;
-}
-
-
-/* Return true if I1 and I2 are equivalent and thus can be crossjumped.  */
-
-static bool
-old_insns_match_p (int mode ATTRIBUTE_UNUSED, rtx i1, rtx i2)
-{
-  rtx p1, p2;
-
-  /* Verify that I1 and I2 are equivalent.  */
-  if (GET_CODE (i1) != GET_CODE (i2))
-return false;
-
-  p1 = PATTERN (i1);
-  p2 = PATTERN (i2);
-
-  if (GET_CODE (p1) != GET_CODE (p2))
-return false;
-
-  /* If this is a CALL_INSN, compare register usage information.
- If we don't check this on stack register machines, the two
- CALL_INSNs might be merged leaving reg-stack.c with mismatching
- numbers of stack registers in the same basic blo

Re: Huge compile time regressions

2006-01-05 Thread Joern RENNECKE


Joern Rennecke wrote:


Joern Rennecke wrote:

I've found that the most striking compilation time increase was for 
flex / parse.c, which is a bison parser.
-Os compilation for i686-pc-linux-gnu X sh-elf --disable-checking 
went from 0.95 to 4.5 seconds.


Optimizing the REG_SET_EQ invocations gave a moderate win, down to 3.5 
seconds.


Doing only one update_life_info_in_dirty_blocks before the crossjumping 
makes the compilation time go right back to 0.95 seconds.  If that 
works, is another question... if there are any transformations that 
invalidate global_live_at_end, we'll have to make them update these regsets.


2006-01-05  J"orn Rennecke <[EMAIL PROTECTED]>

* cfgcleanup.c: Reinstate patches for PR 20070
* struct-equiv.c (struct_equiv_regs_eq_p): New function.
(struct_equiv_init): Only call update_life_info_in_dirty_blocks
in order to initialize regsets if both blocks are dirty.
Make do sanity check of registers bening equal for STRUCT_EQUIV_FINAL.
Add new parameter check_regs_eq.  Changed all callers.
* basic-block.h (struct_equiv_init): Update prototype.

* basic_block.h (STRUCT_EQUIV_SUSPEND_UPDATE): Define.
* struct-equiv.c (struct_equiv_regs_eq_p): Don't call
update_life_info_in_dirty_blocks if STRUCT_EQUIV_SUSPEND_UPDATE
is set in info->mode.
(struct_equiv_init): Likewise.  Also, add sanity check of
global_live_at_end in that case.
* cfgcleanup.c (try_optimize_cfg): Call
update_life_info_in_dirty_blocks before start of loop.  Set
STRUCT_EQUIV_SUSPEND_UPDATE in mode argument passed to
try_crossjump_bb.

Index: cfgcleanup.c
===
/usr/bin/diff -p -d -F^( -u -L cfgcleanup.c (revision 109329) -L 
cfgcleanup.c   (working copy) .svn/text-base/cfgcleanup.c.svn-base 
cfgcleanup.c
--- cfgcleanup.c(revision 109329)
+++ cfgcleanup.c(working copy)
@@ -60,9 +60,7 @@ Software Foundation, 51 Franklin Street,
 static bool first_pass;
 static bool try_crossjump_to_edge (int, edge, edge);
 static bool try_crossjump_bb (int, basic_block);
-static bool outgoing_edges_match (int, basic_block, basic_block);
-static int flow_find_cross_jump (int, basic_block, basic_block, rtx *, rtx *);
-static bool old_insns_match_p (int, rtx, rtx);
+static bool outgoing_edges_match (int *, struct equiv_info *);
 
 static void merge_blocks_move_predecessor_nojumps (basic_block, basic_block);
 static void merge_blocks_move_successor_nojumps (basic_block, basic_block);
@@ -74,7 +72,6 @@ static bool mark_effect (rtx, bitmap);
 static void notice_new_block (basic_block);
 static void update_forwarder_flag (basic_block);
 static int mentions_nonequal_regs (rtx *, void *);
-static void merge_memattrs (rtx, rtx);
 
 /* Set flags for newly created block.  */
 
@@ -881,319 +878,6 @@ merge_blocks_move (edge e, basic_block b
   return NULL;
 }
 
-
-/* Removes the memory attributes of MEM expression
-   if they are not equal.  */
-
-void
-merge_memattrs (rtx x, rtx y)
-{
-  int i;
-  int j;
-  enum rtx_code code;
-  const char *fmt;
-
-  if (x == y)
-return;
-  if (x == 0 || y == 0)
-return;
-
-  code = GET_CODE (x);
-
-  if (code != GET_CODE (y))
-return;
-
-  if (GET_MODE (x) != GET_MODE (y))
-return;
-
-  if (code == MEM && MEM_ATTRS (x) != MEM_ATTRS (y))
-{
-  if (! MEM_ATTRS (x))
-   MEM_ATTRS (y) = 0;
-  else if (! MEM_ATTRS (y))
-   MEM_ATTRS (x) = 0;
-  else 
-   {
- rtx mem_size;
-
- if (MEM_ALIAS_SET (x) != MEM_ALIAS_SET (y))
-   {
- set_mem_alias_set (x, 0);
- set_mem_alias_set (y, 0);
-   }
- 
- if (! mem_expr_equal_p (MEM_EXPR (x), MEM_EXPR (y)))
-   {
- set_mem_expr (x, 0);
- set_mem_expr (y, 0);
- set_mem_offset (x, 0);
- set_mem_offset (y, 0);
-   }
- else if (MEM_OFFSET (x) != MEM_OFFSET (y))
-   {
- set_mem_offset (x, 0);
- set_mem_offset (y, 0);
-   }
-
- if (!MEM_SIZE (x))
-   mem_size = NULL_RTX;
- else if (!MEM_SIZE (y))
-   mem_size = NULL_RTX;
- else
-   mem_size = GEN_INT (MAX (INTVAL (MEM_SIZE (x)),
-INTVAL (MEM_SIZE (y;
- set_mem_size (x, mem_size);
- set_mem_size (y, mem_size);
-
- set_mem_align (x, MIN (MEM_ALIGN (x), MEM_ALIGN (y)));
- set_mem_align (y, MEM_ALIGN (x));
-   }
-}
-  
-  fmt = GET_RTX_FORMAT (code);
-  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
-{
-  switch (fmt[i])
-   {
-   case 'E':
- /* Two vectors must have the same length.  */
- if (XVECLEN (x, i) != XVECLEN (y, i))
-   return;
-
- for (j = 0; j < XVECLEN (

Re: A questionable predicate in sh/predicates.md

2006-01-09 Thread Joern RENNECKE


Kazu Hirata wrote:


Notice that match_code at the beginning does not mention PARALLEL, but
we have GET_CODE (op) != PARALLEL later.  Is this predicate intended
to accept PARALLEL as well?


Yes, it is.


If so, should we change the match_code at
the beginning?


Yes.

Re: merges

2006-01-12 Thread Joern RENNECKE


Jakub Jelinek wrote:

 


Yes.  I think they are useful for all branches if you backport a patch for
a particular fix or e.g. fix something that is not yet fixed on the trunk
and will be only when a particular devel branch with that fix is merged
into trunk.  But in all cases that should be a single commit to fix a
particular bug (or a set of closely related bugs).  Plain merges from
other branches should just say what branch, perhaps revisions were merged.
 

I agree.  Long log messages might be useful for a revision, but not for 
all the bugs affected -
if the commit message is larger than xxx bytes, the bugs should get a 
message from a short
template with the revision(s) involved and a link to the commit message 
filled in.

RFA: re-instate struct_equiv code (Was: Re: Huge compile time regressions)

2006-01-13 Thread Joern RENNECKE


Joern Rennecke wrote:

 
Doing only one update_life_info_in_dirty_blocks before the 
crossjumping makes the compilation time go right back to 0.95 
seconds.  If that works, is another question... if there are any 
transformations that invalidate global_live_at_end, we'll have to make 
them update these regsets.


:ADDPATCH rtl-optimization:

In the previously posted patch, I used EXECUTE_IF_AND_IN_BITMAP, when I 
really meant EXECUTE_IF_XOR_IN_BITMAP - except that that doesn't exist.  
Replaced with bitmap_xor and EXECUTE_IF_SET_IN_BITMAP.
Testing also found one place where register live information 
inconsistency came from: when cross-jumping succeeds, the 
global_live_at_end set of the block that is made to jump into
the other block needs adjusting.  There is a copy_reg_set which could 
have done it, except it is done before the blocks are split, and the 
blocks still contain the old instructions
with the different local registers.  Hence split_block calculates an 
incorrect new global_live_at_end for redirect_from.
Fixed by copying redirect_to->global_live_at_start into 
redirect_from->global_live_at_end after the splits.


Moreover, I've found that update_life_info_in_dirty_blocks must not be 
called while fake edges exist.


regression tested on i686-pc-linux-gnu native, X sh-elf and X sh64-elf.

2006-01-12  J"orn Rennecke <[EMAIL PROTECTED]>

* cfgcleanup.c: Reinstate patches for PR 20070
* struct-equiv.c (struct_equiv_regs_eq_p): New function.
(struct_equiv_init): Only call update_life_info_in_dirty_blocks
in order to initialize regsets if both blocks are dirty.
Make do sanity check of registers bening equal for STRUCT_EQUIV_FINAL.
Add new parameter check_regs_eq.  Changed all callers.
* basic-block.h (struct_equiv_init): Update prototype.

* basic_block.h (STRUCT_EQUIV_SUSPEND_UPDATE): Define.
* struct-equiv.c (struct_equiv_regs_eq_p): Don't call
update_life_info_in_dirty_blocks if STRUCT_EQUIV_SUSPEND_UPDATE
is set in info->mode.
(struct_equiv_init): Likewise.  Also, add sanity check of
global_live_at_end in that case.
* cfgcleanup.c (try_optimize_cfg): Call
update_life_info_in_dirty_blocks before start of loop.  Set
STRUCT_EQUIV_SUSPEND_UPDATE in mode argument passed to
try_crossjump_bb.
(try_crossjump_to_edge): Set global_live_at_end of redirect_from
from global_live_at_start of redirect_to.
(outgoing_edges_match): Check number of edges before comparing
patterns.

Index: cfgcleanup.c
===
/usr/bin/diff -p -d -F^( -u -L cfgcleanup.c (revision 109499) -L 
cfgcleanup.c   (working copy) .svn/text-base/cfgcleanup.c.svn-base 
cfgcleanup.c
--- cfgcleanup.c(revision 109499)
+++ cfgcleanup.c(working copy)
@@ -60,9 +60,7 @@ Software Foundation, 51 Franklin Street,
 static bool first_pass;
 static bool try_crossjump_to_edge (int, edge, edge);
 static bool try_crossjump_bb (int, basic_block);
-static bool outgoing_edges_match (int, basic_block, basic_block);
-static int flow_find_cross_jump (int, basic_block, basic_block, rtx *, rtx *);
-static bool old_insns_match_p (int, rtx, rtx);
+static bool outgoing_edges_match (int *, struct equiv_info *);
 
 static void merge_blocks_move_predecessor_nojumps (basic_block, basic_block);
 static void merge_blocks_move_successor_nojumps (basic_block, basic_block);
@@ -74,7 +72,6 @@ static bool mark_effect (rtx, bitmap);
 static void notice_new_block (basic_block);
 static void update_forwarder_flag (basic_block);
 static int mentions_nonequal_regs (rtx *, void *);
-static void merge_memattrs (rtx, rtx);
 
 /* Set flags for newly created block.  */
 
@@ -881,319 +878,6 @@ merge_blocks_move (edge e, basic_block b
   return NULL;
 }
 
-
-/* Removes the memory attributes of MEM expression
-   if they are not equal.  */
-
-void
-merge_memattrs (rtx x, rtx y)
-{
-  int i;
-  int j;
-  enum rtx_code code;
-  const char *fmt;
-
-  if (x == y)
-return;
-  if (x == 0 || y == 0)
-return;
-
-  code = GET_CODE (x);
-
-  if (code != GET_CODE (y))
-return;
-
-  if (GET_MODE (x) != GET_MODE (y))
-return;
-
-  if (code == MEM && MEM_ATTRS (x) != MEM_ATTRS (y))
-{
-  if (! MEM_ATTRS (x))
-   MEM_ATTRS (y) = 0;
-  else if (! MEM_ATTRS (y))
-   MEM_ATTRS (x) = 0;
-  else 
-   {
- rtx mem_size;
-
- if (MEM_ALIAS_SET (x) != MEM_ALIAS_SET (y))
-   {
- set_mem_alias_set (x, 0);
- set_mem_alias_set (y, 0);
-   }
- 
- if (! mem_expr_equal_p (MEM_EXPR (x), MEM_EXPR (y)))
-   {
- set_mem_expr (x, 0);
- set_mem_expr (y, 0);
- set_mem_offset (x, 0);
- set_mem_offset (y, 0);
-   }
-

How to reverse patch reversal in cfgcleanup.c (Was: RFA: re-instate struct_equiv code)

2006-01-13 Thread Joern RENNECKE

For easier reviewing, I have attached the diff to the cfgcleanup 
version previous to the patch backout.


I'm not sure what the best way to keep the svn history sane is.  When/if 
the patch is approved, should I first do an
svn merge -r108792:108791, check that in, and then apply the patch with 
the actual new stuff?

Or will an svn copy of cfgcleanup.c work better?
Index: cfgcleanup.c
===
/usr/bin/diff -p -d -F^( -u -L cfgcleanup.c (revision 108713) -L 
cfgcleanup.c   (working copy) .svn/tmp/text-base/cfgcleanup.c.svn-base 
cfgcleanup.c
--- cfgcleanup.c(revision 108713)
+++ cfgcleanup.c(working copy)
@@ -936,8 +936,8 @@ condjump_equiv_p (struct equiv_info *inf
   if (code2 == UNKNOWN)
 return false;
 
-  if (call_init && !struct_equiv_init (STRUCT_EQUIV_START | info->mode, info))
-gcc_unreachable ();
+  if (call_init)
+struct_equiv_init (STRUCT_EQUIV_START | info->mode, info, false);
   /* Make the sources of the pc sets unreadable so that when we call
  insns_match_p it won't process them.
  The death_notes_match_p from insns_match_p won't see the local registers
@@ -1096,7 +1096,7 @@ outgoing_edges_match (int *mode, struct 
}
 
  if (identical
- && struct_equiv_init (STRUCT_EQUIV_START | *mode, info))
+ && struct_equiv_init (STRUCT_EQUIV_START | *mode, info, true))
{
  bool match;
 
@@ -1118,17 +1118,20 @@ outgoing_edges_match (int *mode, struct 
}
 }
 
+  /* Ensure that the edge counts do match.  */
+  if (EDGE_COUNT (bb1->succs) != EDGE_COUNT (bb2->succs))
+return false;
+
   /* First ensure that the instructions match.  There may be many outgoing
  edges so this test is generally cheaper.  */
-  if (!struct_equiv_init (STRUCT_EQUIV_START | *mode, info)
+  /* FIXME: the regset compare might be costly.  We should try to get a cheap
+ and reasonably effective test first.  */
+  if (!struct_equiv_init (STRUCT_EQUIV_START | *mode, info, true)
   || !insns_match_p (BB_END (bb1), BB_END (bb2), info))
 return false;
 
-  /* Search the outgoing edges, ensure that the counts do match, find possible
- fallthru and exception handling edges since these needs more
- validation.  */
-  if (EDGE_COUNT (bb1->succs) != EDGE_COUNT (bb2->succs))
-return false;
+  /* Search the outgoing edges, find possible fallthru and exception
+ handling edges since these needs more validation.  */
 
   FOR_EACH_EDGE (e1, ei, bb1->succs)
 {
@@ -1353,8 +1356,6 @@ try_crossjump_to_edge (int mode, edge e1
fprintf (dump_file, "Splitting bb %i before %i insns\n",
 src2->index, nmatch);
   redirect_to = split_block (src2, PREV_INSN (info.cur.x_start))->dest;
-  COPY_REG_SET (info.y_block->il.rtl->global_live_at_end,
-   info.x_block->il.rtl->global_live_at_end);
 }
 
   if (dump_file)
@@ -1432,6 +1433,8 @@ try_crossjump_to_edge (int mode, edge e1
   to_remove = single_succ (redirect_from);
 
   redirect_edge_and_branch_force (single_succ_edge (redirect_from), 
redirect_to);
+  COPY_REG_SET (redirect_from->il.rtl->global_live_at_end,
+   redirect_to->il.rtl->global_live_at_start);
   delete_basic_block (to_remove);
 
   update_forwarder_flag (redirect_from);
@@ -1588,9 +1591,22 @@ try_optimize_cfg (int mode)
   bool changed;
   int iterations = 0;
   basic_block bb, b, next;
+  bool can_modify_jumps = ! targetm.cannot_modify_jumps_p ();
+  bool do_crossjump = false;
 
-  if (mode & CLEANUP_CROSSJUMP)
-add_noreturn_fake_exit_edges ();
+  if (can_modify_jumps && (mode & CLEANUP_CROSSJUMP))
+{
+  do_crossjump = true;
+  /* Life info updates malfunction in the presence of fake edges.
+If we want to do any updates while fake edges are present, we'll have
+to make sure to exclude them when recomputing global_live_at_end,
+or treat them like EH edges.  */
+  update_life_info_in_dirty_blocks (UPDATE_LIFE_GLOBAL_RM_NOTES,
+   (PROP_DEATH_NOTES
+| ((mode & CLEANUP_POST_REGSTACK)
+   ? PROP_POST_REGSTACK : 0)));
+  add_noreturn_fake_exit_edges ();
+}
 
   if (mode & (CLEANUP_UPDATE_LIFE | CLEANUP_CROSSJUMP | CLEANUP_THREADING))
 clear_bb_flags ();
@@ -1598,7 +1614,7 @@ try_optimize_cfg (int mode)
   FOR_EACH_BB (bb)
 update_forwarder_flag (bb);
 
-  if (! targetm.cannot_modify_jumps_p ())
+  if (can_modify_jumps)
 {
   first_pass = true;
   /* Attempt to merge blocks as made possible by edge removal.  If
@@ -1754,8 +1770,8 @@ try_optimize_cfg (int mode)
changed_here = true;
 
  /* Look for shared code between blocks.  */
- if ((mode & CLEANUP_CROSSJUMP)
- && try_crossjump_bb (mode, b))
+ if

Re: GCC 4.9.0 Released

2014-04-22 Thread Joern Rennecke

On 22 April 2014 14:10, Jakub Jelinek  wrote:
> One year and one month passed from the time when the last major version
> of the GNU Compiler Collection has been announced, so it is the time again
> to announce a new major GCC release, 4.9.0.
>
> GCC 4.9.0 is a major release containing substantial new
> functionality not available in GCC 4.8.x or previous GCC releases.
>
> The Local Register Allocator, introduced in GCC 4.8.0 for ia32 and
> x86-64 targets only, is now used also on the Aarch64, ARM, S/390
> and ARC targets by default and on PowerPC and RX targets optionally.

Actually, I had switched the default for ARC -mlra back to off because of
PR rtl-optimization/55464 - not being able to configure libgcc is a
show-stopper.

I just tried the testcase to see what it does now, and the -mlra option is not
even accepted.  cc1 complains that -mlra is valid for  but not for C.
It turns out that the comment I put into arc.opt about why I switched
the default
was interpreted as part of the option description.  Moving the comment allowed
-mlra to be accepted, and the PR55464 testcase no longer causes an ICE.

In fact, when I flip the default back to lra, it can configure libgcc, however,
it still fails to build it - throwing an ICE at lra-constraints.c:3492
while trying to compile libgcc2.c to __gcc_bcmp.o for arc600 .

Re: LTO + conditional jump + delay slot

2014-04-30 Thread Joern Rennecke

On 30 April 2014 12:20, Richard Biener  wrote:

> the delay-slot code is fragile, you probably simply run into a bug.

In particular, we lack in-tree ports with multiple delay slots, so while the
support exists theoretically, it is not tested and maintained in any
meaningful way.

Re: soft-fp functions support without using libgcc

2014-05-22 Thread Joern Rennecke

On 21 May 2014 14:13, Sheheryar Zahoor Qazi
 wrote:
>>>Building libgcc is not optional.  It is required for all targets.
>
> So, irrespective whether i provide floating point implementation by
> soft-fp, fpu-bit or ieeelib, an error free libgcc build is a MUST?
>
> What if I dont want to generate calls to libgcc.a but want want gcc to
> generate inline code?

While this is not possible for all calls, a lot of library calls can be
avoided or emitted with a custome ABI by having a suitable expander
in the .md file that emits whatever you want.  E.g., several of the
ARC subtargets/mulitlibs emit inline code for the simpler soft-fp functions,
and custom calls to optimized assembler for medium complexity operations.
You should really read md.texi and look at optabs.def to get a glimpse
of the code generation customization potential of GCC.

Re: combination of read/write and earlyclobber constraint modifier

2014-07-02 Thread Joern Rennecke

On 2 July 2014 08:02, Marc Glisse  wrote:
> On Wed, 2 Jul 2014, Tom de Vries wrote:
>
>> On 02-07-14 08:23, Marc Glisse wrote:
>>>
>>> I think it could have used (match_dup 0) instead of operand 1, if there
>>> had been only the first alternative. And then the constraint would have been
>>> +&.
>>
>>
>> isn't that explicitly listed as unsupported here (
>> https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3244
>> ):
>> ...
>> Note that match_dup should not be used to tell the compiler that a
>> particular register is being used for two operands (example: add that adds
>> one register to another; the second register is both an input operand and
>> the output operand). Use a matching constraint (see Simple Constraints) for
>> those. match_dup is for the cases where one operand is used in two places in
>> the template, such as an instruction that computes both a quotient and a
>> remainder, where the opcode takes two input operands but the RTL template
>> has to refer to each of those twice; once for the quotient pattern and once
>> for the remainder pattern.

Note that this uses 'should', not must.  That's a shorthand that, in
general, you
shouldn't do that, although there can be special circumstances where
it is valid.
The distinction between multiple operands vs. single operands that
appears multiple
times in the RTL is not even something that makes sense in the framework that
the register allocators operate in.  Although you'd be well-advised not to use
match_dup in your add pattern, because reload needs to generate adds in some
circumstances, and it has specific requirements there.

The long explanation is that the matching constraint allows the
register allocators /
reload to fix up the operands to match, so if you want the pattern to be used to
implement this operation, and you don't mind some reg-reg moves to be used
if that's what it takes, you should use a matching constraint.

If, on the other hand, you have a pattern of marginal utiliy, that is
not worth the
trouble of doing extra reg-reg copies to utilize, a match_dup is better.  Such
patterns are not all that likely to be recognized by simple matching/combining,
but you can generate then in expanders/splitters/peephole2s.

Re: GCC version bikeshedding

2014-07-29 Thread Joern Rennecke

On 29 July 2014 18:14, Richard Biener  wrote:
> On July 29, 2014 6:45:13 PM CEST, Eric Botcazou  
> wrote:
>>> I think that if anybody has strong objections, now is the time to
>>make
>>> them.  Otherwise I think we should go with this plan.
>>
>>IMHO the cure is worse than the disease.
>>
>>> Given that there is no clear reason to ever change the major version
>>> number, making that change will not convey any useful information to
>>> our users.  So let's just drop the major version number.  Once we've
>>> made that decision, then the next release (in 2015) naturally becomes
>>> 5.0, the release after that (in 2016) becomes 6.0, etc.
>>
>>I don't really understand the "naturally": if you drop the major
>>version
>>number, the next release should be 10.0, not 5.0.
>
> 10.0 would be even better from a marketing perspective.

So if we want version number inflation with plausible deniability, how
about we first increment the miner version number - so we get 4.10.0,
and then we concatenate major and minor version number, yielding
410.0

Re: GCC version bikeshedding

2014-07-29 Thread Joern Rennecke

On 29 July 2014 18:30, Markus Trippelsdorf  wrote:
> Since gcc is released annually, why not tie the version to the year of
> the release, instead of choosing an arbitrary number?
>
> 15.o

What did the Romans every do for us?  Release GCC XV, obviously...
Unfortunately, they couldn't release *.0 versions, for lack of a zero.

Now, if we are talking about the coming year, that would be 2015.
And since we use decimal numbers these days, that should be
reflected in version numbers of releases tagged anytime other
than 00:00 new years day.

A year without leap days/seconds has 365 days of 24 hours of 3600
seconds, so for second accuracy, we need eight digits after the
decimal point.

E.g. A GCC release on the 1st April 2015 at 09:00 UTC is made
90 days and 9 hours after the start of the year, and should thus carry
the version number  2015.24760274

Re: GCC version bikeshedding

2014-07-29 Thread Joern Rennecke

On 29 July 2014 19:29, Joern Rennecke  wrote:
> E.g. A GCC release on the 1st April 2015 at 09:00 UTC is made
> 90 days and 9 hours after the start of the year, and should thus carry
> the version number  2015.24760274

P.S.: a patchlevel  release in a subsequent year can be marked by increasing
the first digit after the decimal point beyond nine into the
hexadecimal ream and beyond.
So a patchlevel release of gcc 2015.x .on 1st Juy 2018 12:00 UTC
would have version number 2015.Y9726027 .

Re: GCC version bikeshedding

2014-08-06 Thread Joern Rennecke

On 6 August 2014 11:31, Richard Biener  wrote:
> Ok, so the problematical case is
>
> struct X { std::string s; };
> void foo (X&);

Wouldn't it be even more troublesome with an application that dynloads
dsos depending on user input.
The install script might check if the dso with the right soname is present,
but then you still get dynamic linker errors when the user tries to
do something with the application, which could be an arbitrary time after
the upgrade.

1 2 3 4 5 6 >

1 - 100 of 586 matches

Mail list logo