Re: Alias analysis or DSE problem?

2009-10-17 Thread Richard Guenther
On Sat, Oct 17, 2009 at 12:44 AM, Justin Seyster  wrote:
> I'm currently porting a plug-in that used to target the 4.3.0-based
> plug-in branch to the new top-of-tree plug-in system.  I'm really
> stymied by a bug whose source I just cannot track down!  Usually that
> means there is an error in my code, but the problem seems to involve
> tree-ssa-dse.c and tree-ssa-alias.c, so I wanted to ask for a bit of
> help from the list.
>
> The plug-in is relatively simple: it's designed to synthesize calls to
> a hook function in specific places.  I streamlined the plug-in to the
> simplest case that still triggers the bug: the function call has just
> one argument, which is a pointer to a string constant.
>
> The test program I'm instrumenting gets hook calls in two places.  The
> first one goes in without a problem (I can compile the program and
> verify that the hook gets called with the correct string), but the
> second (basically identical) call triggers an assertion failure during
> the Dead Store Elimination (dse) pass.
>
> The problem occurs in dse_possible_dead_store_p() getting called for a
> GIMPLE_ASSIGN that comes just before the inserted GIMPLE_CALL in the
> same basic block.  dse_possible_dead_store_p() pulls the gimple_vdef
> out of this GIMPLE_ASSIGN (which is an SSA_NAME reference to .MEM) and
> then iterates through its imm_uses list (with FOR_EACH_IMM_USE_STMT).
>
> The surpise to me is that my inserted GIMPLE_CALL hook is somehow in
> that list!  That results in dse_possible_dead_store_p() checking the
> hook call with ref_maybe_used_by_call_p(), which goes through each
> argument in the call (in this case just a pointer to a string
> constant) to check if it aliases with another variable with
> refs_may_alias_p_1().
>
> The reference is an ADDR_EXPR, though, which it appears does not ever
> belong in refs_may_alias_p_1().  I say that because of the assertion
> statement at the because of refs_may_alias_p_1():
>
>  gcc_assert ((!ref1->ref
>               || SSA_VAR_P (ref1->ref)
>               || handled_component_p (ref1->ref)
>               || INDIRECT_REF_P (ref1->ref)
>               || TREE_CODE (ref1->ref) == TARGET_MEM_REF)...
>
> My ADDR_EXPR does not meet any of those conditions, so something has
> gone wrong here!
>
> After figuring all that out, I'm stuck with a ton of questions:
>
> 1) Why is my inserted GIMPLE_CALL hook ending up in some SSA_NAME's
> imm_uses list?  The correctly working hook never gets inserted onto an
> imm_uses list, so what could be making this one different?  Is it
> possible that I malformed the GIMPLE_CALL in some way that would cause
> that?

Because your call has side-effects on global memory, this is how they
are represented.  If your calls do not have such side-effects you should
mark them accordingly - as pure, const or simply no-vops.

> 2) Why is the assertion at the beginning of may_alias_p_1() there?
> Somebody is responsible for making sure that refs who don't pass this
> check never end up in may_alias_p_1(), but who?  Should
> ref_maybe_used_by_call_p() actually be making this check before
> calling may_alias_p_1(), or should I be forming GIMPLE_CALLs so that
> all their arguments are suitable for may_alias_p_1()?

I think the call you insert is not of valid gimple form - an address
argument has to fulfill the is_gimple_min_invariant() predicate.
Thus I suspect you have something like &ptr->field in there which
needs a temporary.

Richard.

> Sorry for this enormous brain-dump e-mail!  I'm still really new to
> GCC, and I'd appreciate any guidance I can get here.  Thanks.
>        --Justin
>


Bug in binop rotate ?

2009-10-17 Thread Andrew Hutchinson
I have been adding rotate capability to AVR port and have come across 
what I think is bug in

optabs.c: expand_binop()

This occurs during a rotate expansion. For example

target  = op0  rotated by op1

In the particular situation (code extract below) it tries a reverse 
rotate of (bits - op1). Where this expression is expanded as a simple 
integer,

a negation or subtraction depending on type of op1 and target.

The expansion of the subtraction is using the mode of the target - I 
believe it should be using the mode of op1.

The mode of the rotation  amount need not be the same as the target.

target:DI = Op0:DI rotate op1:HI

In my testcase it is not and I get  asserts latter in simplfy_rtx.

The negation mode looks equally wrong.

Am I mistaken?


 /* If we were trying to rotate, and that didn't work, try rotating
the other direction before falling back to shifts and bitwise-or.  */
 if (((binoptab == rotl_optab
   && optab_handler (rotr_optab, mode)->insn_code != CODE_FOR_nothing)
  || (binoptab == rotr_optab
  && optab_handler (rotl_optab, mode)->insn_code != CODE_FOR_nothing))
 && mclass == MODE_INT)
   {
 optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab);
 rtx newop1;
 unsigned int bits = GET_MODE_BITSIZE (mode);

 if (CONST_INT_P (op1))
   newop1 = GEN_INT (bits - INTVAL (op1));
 else if (targetm.shift_truncation_mask (mode) == bits - 1)
   newop1 = negate_rtx (mode, op1);
 else
   newop1 = expand_binop (mode, sub_optab,
  GEN_INT (bits), op1,
  NULL_RTX, unsignedp, OPTAB_DIRECT);


Re: Bug in binop rotate ?

2009-10-17 Thread Richard Guenther
On Sat, Oct 17, 2009 at 3:47 PM, Andrew Hutchinson
 wrote:
> I have been adding rotate capability to AVR port and have come across what I
> think is bug in
> optabs.c: expand_binop()
>
> This occurs during a rotate expansion. For example
>
> target  = op0  rotated by op1
>
> In the particular situation (code extract below) it tries a reverse rotate
> of (bits - op1). Where this expression is expanded as a simple integer,
> a negation or subtraction depending on type of op1 and target.
>
> The expansion of the subtraction is using the mode of the target - I believe
> it should be using the mode of op1.
> The mode of the rotation  amount need not be the same as the target.
>
> target:DI = Op0:DI rotate op1:HI
>
> In my testcase it is not and I get  asserts latter in simplfy_rtx.
>
> The negation mode looks equally wrong.
>
> Am I mistaken?

I think you are correct.

Richard.

>
>  /* If we were trying to rotate, and that didn't work, try rotating
>    the other direction before falling back to shifts and bitwise-or.  */
>  if (((binoptab == rotl_optab
>   && optab_handler (rotr_optab, mode)->insn_code != CODE_FOR_nothing)
>      || (binoptab == rotr_optab
>      && optab_handler (rotl_optab, mode)->insn_code != CODE_FOR_nothing))
>     && mclass == MODE_INT)
>   {
>     optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab);
>     rtx newop1;
>     unsigned int bits = GET_MODE_BITSIZE (mode);
>
>     if (CONST_INT_P (op1))
>   newop1 = GEN_INT (bits - INTVAL (op1));
>     else if (targetm.shift_truncation_mask (mode) == bits - 1)
>   newop1 = negate_rtx (mode, op1);
>     else
>   newop1 = expand_binop (mode, sub_optab,
>                  GEN_INT (bits), op1,
>                  NULL_RTX, unsignedp, OPTAB_DIRECT);
>


Re: Bug in binop rotate ?

2009-10-17 Thread Andrew Hutchinson

Thanks for your review.

I have submitted bug report.



Richard Guenther wrote:

On Sat, Oct 17, 2009 at 3:47 PM, Andrew Hutchinson
 wrote:
  

I have been adding rotate capability to AVR port and have come across what I
think is bug in
optabs.c: expand_binop()

This occurs during a rotate expansion. For example

target  = op0  rotated by op1

In the particular situation (code extract below) it tries a reverse rotate
of (bits - op1). Where this expression is expanded as a simple integer,
a negation or subtraction depending on type of op1 and target.

The expansion of the subtraction is using the mode of the target - I believe
it should be using the mode of op1.
The mode of the rotation  amount need not be the same as the target.

target:DI = Op0:DI rotate op1:HI

In my testcase it is not and I get  asserts latter in simplfy_rtx.

The negation mode looks equally wrong.

Am I mistaken?



I think you are correct.

Richard.

  

 /* If we were trying to rotate, and that didn't work, try rotating
   the other direction before falling back to shifts and bitwise-or.  */
 if (((binoptab == rotl_optab
  && optab_handler (rotr_optab, mode)->insn_code != CODE_FOR_nothing)
 || (binoptab == rotr_optab
 && optab_handler (rotl_optab, mode)->insn_code != CODE_FOR_nothing))
&& mclass == MODE_INT)
  {
optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab);
rtx newop1;
unsigned int bits = GET_MODE_BITSIZE (mode);

if (CONST_INT_P (op1))
  newop1 = GEN_INT (bits - INTVAL (op1));
else if (targetm.shift_truncation_mask (mode) == bits - 1)
  newop1 = negate_rtx (mode, op1);
else
  newop1 = expand_binop (mode, sub_optab,
 GEN_INT (bits), op1,
 NULL_RTX, unsignedp, OPTAB_DIRECT);




  


Re: Constraint modifier for partially overlaping operands

2009-10-17 Thread Dave Korn
Ian Lance Taylor wrote:
> Andrew Hutchinson  writes:
> 
>> I can use "=" modifier to make operands use same register and early
>> clobber "&" to avoid overlaps.
>>
>> Is it possible to have or construct a contraint that permits partial
>> overlap operands. (which neither = or & would allow)
>> The case would be  wide types taking multiple hard registers.
>>
>> eg Input r20..23 Output r22..25
> 
> There is no such constraint today.  I suppose it would be possible to
> define such a constraint if it seemed useful.  

  Maybe I'm misunderstanding, but I thought that was already the default if
you use neither "=" to specify full overlap nor "&" for no overlap?

  Frex, a lot of ABIs specify that DImode values stored in pairs of SImode
registers must always use an odd-even register pair (using a test in
HARD_REGNO_MODE_OK), but when I was working on a custom port that allowed them
in any register pair, GCC would happily generate partially overlapping movdi
instructions such as (set:DI (reg:DI 5) (reg:DI 6)) (i.e., move r6/7 -> r5/6).
 This hasn't changed since 3.3, has it?

cheers,
  DaveK


Re: Constraint modifier for partially overlaping operands

2009-10-17 Thread Andrew Hutchinson


The situation comes up where no or a partial overlap of registers 
permits optimal code - since this can avoid using scratch register

Thus no overlap OR partial overlap is preferred (or required)

Using nothing leaves  overlap without preference - full, partial,none
Using = gives the least preffered case - full
Using & gives only the no-overlapping case - none

Ideally "NOT"= is required - which I would hope the register allocator 
would quite like too.




Dave Korn wrote:

Ian Lance Taylor wrote:
  

Andrew Hutchinson  writes:



I can use "=" modifier to make operands use same register and early
clobber "&" to avoid overlaps.

Is it possible to have or construct a contraint that permits partial
overlap operands. (which neither = or & would allow)
The case would be  wide types taking multiple hard registers.

eg Input r20..23 Output r22..25
  

There is no such constraint today.  I suppose it would be possible to
define such a constraint if it seemed useful.  



  Maybe I'm misunderstanding, but I thought that was already the default if
you use neither "=" to specify full overlap nor "&" for no overlap?

  Frex, a lot of ABIs specify that DImode values stored in pairs of SImode
registers must always use an odd-even register pair (using a test in
HARD_REGNO_MODE_OK), but when I was working on a custom port that allowed them
in any register pair, GCC would happily generate partially overlapping movdi
instructions such as (set:DI (reg:DI 5) (reg:DI 6)) (i.e., move r6/7 -> r5/6).
 This hasn't changed since 3.3, has it?

cheers,
  DaveK

  


Re: Constraint modifier for partially overlaping operands

2009-10-17 Thread Richard Henderson

On 10/16/2009 11:04 PM, Ian Lance Taylor wrote:

Andrew Hutchinson  writes:


I can use "=" modifier to make operands use same register and early
clobber "&" to avoid overlaps.

Is it possible to have or construct a contraint that permits partial
overlap operands. (which neither = or&  would allow)
The case would be  wide types taking multiple hard registers.

eg Input r20..23 Output r22..25


There is no such constraint today.  I suppose it would be possible to
define such a constraint if it seemed useful.


I'd much prefer if the port decomposed its double word operations and 
used the lower-subreg pass to decompose the double word registers.  At 
which point the register allocator has all of the information it needs 
to do the right thing.



r~


Re: Constraint modifier for partially overlaping operands

2009-10-17 Thread Dave Korn
Andrew Hutchinson wrote:

> OR partial overlap is preferred (or required)

  Oh, I see what you mean.  Yes, that probably would be useful, thanks for the
clarification.

cheers,
  DaveK



Re: Constraint modifier for partially overlaping operands

2009-10-17 Thread Andrew Hutchinson

Yes.

But we need to lower after combine and before register allocation.
I'm still figuring out how to do that.

Lowering before combine - in particular causes a lot of code bloat. This 
loose all optimization of conditional jumps, shifts etc.
In our case, most lowering is delayed until after reload. This retains 
the RTL optimization but is suboptimal for allocation and lacks enough 
forward propagation.


For a similar reason, not splitting wide types often produces far better 
code.


One exception is DImode which by default is  lowered at expand -since 
there are no  DImode instructions defined.
This ends up with pretty dire code since the built in expansion cant use 
a "carry" based pattern and we again miss
the RTL optimization at the wider level. 

It would seem we need to have target dependent pass order to improve on 
this significantly.



Andy








Richard Henderson wrote:

On 10/16/2009 11:04 PM, Ian Lance Taylor wrote:

Andrew Hutchinson  writes:


I can use "=" modifier to make operands use same register and early
clobber "&" to avoid overlaps.

Is it possible to have or construct a contraint that permits partial
overlap operands. (which neither = or&  would allow)
The case would be  wide types taking multiple hard registers.

eg Input r20..23 Output r22..25


There is no such constraint today.  I suppose it would be possible to
define such a constraint if it seemed useful.


I'd much prefer if the port decomposed its double word operations and 
used the lower-subreg pass to decompose the double word registers.  At 
which point the register allocator has all of the information it needs 
to do the right thing.



r~



__attribute__((optimize)) and fast-math related oddities

2009-10-17 Thread tbp
Hang on while i put on my flame-proof suit. There.
Merrily trying to make a test-case showing how unmanageable it is to
try to override *math* flags per function, i soon had to stop
because...
$ cat amusing.cc
#include 
static __attribute__((optimize("-fno-associative-math"))) double
foo1(double x) { return (x + pow(2, 52)) - pow(2, 52); }
static __attribute__((noinline)) double bar1(double x) { return foo1(x); }
#ifdef HUH
static __attribute__((optimize("-fno-associative-math"))) double
foo2(double x) { return (x + pow(2, 52)) - pow(2, 52); }
static __attribute__((noinline, optimize("-ffast-math"))) double
bar2(double x) { return foo2(x); }
#endif
int main() {
double x = 1.1;
if (bar1(x) == x) return 1;
#ifdef HUH
if (bar2(x) == x) return 2;
#endif
return 0;
}
$ g++-4.4 -O2 amusing.cc -ffast-math && ./a.out; echo $?
0
$ g++-4.4 -O2 amusing.cc -ffast-math -DHUH && ./a.out; echo $?
1
$ g++-4.4 -O2 amusing.cc && ./a.out; echo $?
0
$ g++-4.4 -O2 amusing.cc -DHUH && ./a.out; echo $?
1
... made even less sense than expected.
I got that like for other 'incompatible' flags, conflicting math flags
should prevent inlining, only they don't. And it's all weird. But that
one takes the cake. Could someone tell me what the fuss is about?

$ g++-4.4 -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian
4.4.1-6' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
--enable-shared --enable-multiarch --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4
--program-suffix=-4.4 --enable-nls --enable-clocale=gnu
--enable-libstdcxx-debug --enable-mpfr --enable-objc-gc
--with-arch-32=i486 --with-tune=generic --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 4.4.1 (Debian 4.4.1-6)


GC overhead just because of aligning long double?

2009-10-17 Thread Richard Guenther

We're waisting 8 bytes for every gimple_seq_node_d on x86_64 just
because we might be allocating a structure with a long double
element (16 byte aligned).  I grepped and didn't find traces of
such a use, so - can we just document that callers need to
round up allocation sizes to multiples of the required alignment
(and just enforce that for alignment requirements bigger than pointers)?

A gimple_seq_node_d should be allocated from 24 byte page sizes
but are allocated from 32 byte page sizes because of the above
issue - wasting 25%.

Thanks,
Richard.


Re: GC overhead just because of aligning long double?

2009-10-17 Thread Richard Guenther
On Sun, 18 Oct 2009, Richard Guenther wrote:

> 
> We're waisting 8 bytes for every gimple_seq_node_d on x86_64 just
> because we might be allocating a structure with a long double
> element (16 byte aligned).  I grepped and didn't find traces of
> such a use, so - can we just document that callers need to
> round up allocation sizes to multiples of the required alignment
> (and just enforce that for alignment requirements bigger than pointers)?
> 
> A gimple_seq_node_d should be allocated from 24 byte page sizes
> but are allocated from 32 byte page sizes because of the above
> issue - wasting 25%.

Changing MAX_ALIGNMENT from 16 to 8 on x86_64 makes us go from

Total2788975191
775857920 40632633400225192 55862833
source location Garbage
Freed Leak OverheadTimes

to

Total2262618351
747470600 38848265122407808 55849810
source location Garbage
Freed Leak OverheadTimes

for SPEC 2006 tonto.  I added gimple_seq_node_d, 40, 56, 72, 88, 104
and 120 as extra sizes of which

Total Allocated page size  24:  589531776
Total Allocated page size  56:  213088232
Total Allocated page size  72:  232481808

have a relevant amount of allocation.

In the past the issue was that gc allocs are not type-safe, so
we sometimes allocate sizes that are not multiples of the required
alignment, like for struct { int len; char c[1]; }.  None
of the existing uses should require larger than pointer alignment
though, so documenting that should be enough.

Richard.