------- Additional Comments From wilson at specifixinc dot com 2005-04-19 23:05 ------- Subject: Re: IA-64 NaT consumption faults due to uninitialized register reads
pinskia at gcc dot gnu dot org wrote: > ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-04-19 > 20:47 ------- > To me, the target specific code should be the one to fix this problem up and > not the middle-end or at > least have a hook for it so you don't mess around with other targets getting > the speed up. Anyways > seems like someone thought it would be cool if they did this, oh well. The change I am suggesting should not hurt performance, and I expect that it would actually help performance in many cases. Currently, the first assignment to a structure is a bitfield insert. If we zero the structure before the first assignment, then combine will give us a simple assignment instead, which will be faster than a bitfield insert for most targets. This may also allow other assignments to be combined in, giving further benefits. (There can be multiple first assignments if there are multiple blocks where the structure becomes live.) I agree that the optimizations being performed by tree-ssa are useful here, but one must not be confused by the big picture issues here into ignoring the details. Emitting a bit-field insert when only a simple assignment is needed is wrong. It may cause performance loss on many targets, and it causes core dumps on IA-64. Take a look at this example. struct s { unsigned long i : 32; unsigned long j : 32;}; int i; struct s sub (void) { struct s foo; foo.i = i; return foo; } Compiling this for x86-64 on mainline, I get 10 instructions, which perform two bit-field insertions. Compiling this with gcc-3.3, I get 7 instructions which perform one bit-field insertion. I think the optimal code is two instructions, one to load i into the low part of the return register, and one to return. The upper bits of the structure are don't care bits, so we can set them to anything we want. There is no need for any bitfield insertion here at all. Mainline does even worse than gcc-3 here because in order to decompose the structure it creates a fake j assignment, and then we end up emitting bitfield insertion code for the fake j assignment, even though this code is completely useless. Furthermore, the RTL optimizer is not able to delete this fake j assignment, because it is a bitfield insert. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21111