[Bug middle-end/21111] IA-64 NaT consumption faults due to uninitialized register reads

wilson at specifixinc dot com Tue, 19 Apr 2005 16:05:27 -0700

------- Additional Comments From wilson at specifixinc dot com  2005-04-19 
23:05 -------
Subject: Re:  IA-64 NaT consumption faults due to uninitialized
 register reads

pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2005-04-19 
> 20:47 -------
> To me, the target specific code should be the one to fix this problem up and 
> not the middle-end or at 
> least have a hook for it so you don't mess around with other targets getting 
> the speed up.  Anyways 
> seems like someone thought it would be cool if they did this, oh well.

The change I am suggesting should not hurt performance, and I expect 
that it would actually help performance in many cases.

Currently, the first assignment to a structure is a bitfield insert.

If we zero the structure before the first assignment, then combine will 
give us a simple assignment instead, which will be faster than a 
bitfield insert for most targets.  This may also allow other assignments 
to be combined in, giving further benefits.  (There can be multiple 
first assignments if there are multiple blocks where the structure 
becomes live.)

I agree that the optimizations being performed by tree-ssa are useful 
here, but one must not be confused by the big picture issues here into 
ignoring the details.  Emitting a bit-field insert when only a simple 
assignment is needed is wrong.  It may cause performance loss on many 
targets, and it causes core dumps on IA-64.

Take a look at this example.
struct s { unsigned long i : 32; unsigned long j : 32;};
int i;
struct s
sub (void)
{
   struct s foo;
   foo.i = i;
   return foo;
}
Compiling this for x86-64 on mainline, I get 10 instructions, which 
perform two bit-field insertions.  Compiling this with gcc-3.3, I get 7 
instructions which perform one bit-field insertion.

I think the optimal code is two instructions, one to load i into the low 
part of the return register, and one to return.  The upper bits of the 
structure are don't care bits, so we can set them to anything we want. 
There is no need for any bitfield insertion here at all.

Mainline does even worse than gcc-3 here because in order to decompose 
the structure it creates a fake j assignment, and then we end up 
emitting bitfield insertion code for the fake j assignment, even though 
this code is completely useless.  Furthermore, the RTL optimizer is not 
able to delete this fake j assignment, because it is a bitfield insert.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21111

[Bug middle-end/21111] IA-64 NaT consumption faults due to uninitialized register reads

Reply via email to