Re: Alias analysis or DSE problem?
On Sat, Oct 17, 2009 at 12:44 AM, Justin Seyster wrote: > I'm currently porting a plug-in that used to target the 4.3.0-based > plug-in branch to the new top-of-tree plug-in system. I'm really > stymied by a bug whose source I just cannot track down! Usually that > means there is an error in my code, but the problem seems to involve > tree-ssa-dse.c and tree-ssa-alias.c, so I wanted to ask for a bit of > help from the list. > > The plug-in is relatively simple: it's designed to synthesize calls to > a hook function in specific places. I streamlined the plug-in to the > simplest case that still triggers the bug: the function call has just > one argument, which is a pointer to a string constant. > > The test program I'm instrumenting gets hook calls in two places. The > first one goes in without a problem (I can compile the program and > verify that the hook gets called with the correct string), but the > second (basically identical) call triggers an assertion failure during > the Dead Store Elimination (dse) pass. > > The problem occurs in dse_possible_dead_store_p() getting called for a > GIMPLE_ASSIGN that comes just before the inserted GIMPLE_CALL in the > same basic block. dse_possible_dead_store_p() pulls the gimple_vdef > out of this GIMPLE_ASSIGN (which is an SSA_NAME reference to .MEM) and > then iterates through its imm_uses list (with FOR_EACH_IMM_USE_STMT). > > The surpise to me is that my inserted GIMPLE_CALL hook is somehow in > that list! That results in dse_possible_dead_store_p() checking the > hook call with ref_maybe_used_by_call_p(), which goes through each > argument in the call (in this case just a pointer to a string > constant) to check if it aliases with another variable with > refs_may_alias_p_1(). > > The reference is an ADDR_EXPR, though, which it appears does not ever > belong in refs_may_alias_p_1(). I say that because of the assertion > statement at the because of refs_may_alias_p_1(): > > gcc_assert ((!ref1->ref > || SSA_VAR_P (ref1->ref) > || handled_component_p (ref1->ref) > || INDIRECT_REF_P (ref1->ref) > || TREE_CODE (ref1->ref) == TARGET_MEM_REF)... > > My ADDR_EXPR does not meet any of those conditions, so something has > gone wrong here! > > After figuring all that out, I'm stuck with a ton of questions: > > 1) Why is my inserted GIMPLE_CALL hook ending up in some SSA_NAME's > imm_uses list? The correctly working hook never gets inserted onto an > imm_uses list, so what could be making this one different? Is it > possible that I malformed the GIMPLE_CALL in some way that would cause > that? Because your call has side-effects on global memory, this is how they are represented. If your calls do not have such side-effects you should mark them accordingly - as pure, const or simply no-vops. > 2) Why is the assertion at the beginning of may_alias_p_1() there? > Somebody is responsible for making sure that refs who don't pass this > check never end up in may_alias_p_1(), but who? Should > ref_maybe_used_by_call_p() actually be making this check before > calling may_alias_p_1(), or should I be forming GIMPLE_CALLs so that > all their arguments are suitable for may_alias_p_1()? I think the call you insert is not of valid gimple form - an address argument has to fulfill the is_gimple_min_invariant() predicate. Thus I suspect you have something like &ptr->field in there which needs a temporary. Richard. > Sorry for this enormous brain-dump e-mail! I'm still really new to > GCC, and I'd appreciate any guidance I can get here. Thanks. > --Justin >
Bug in binop rotate ?
I have been adding rotate capability to AVR port and have come across what I think is bug in optabs.c: expand_binop() This occurs during a rotate expansion. For example target = op0 rotated by op1 In the particular situation (code extract below) it tries a reverse rotate of (bits - op1). Where this expression is expanded as a simple integer, a negation or subtraction depending on type of op1 and target. The expansion of the subtraction is using the mode of the target - I believe it should be using the mode of op1. The mode of the rotation amount need not be the same as the target. target:DI = Op0:DI rotate op1:HI In my testcase it is not and I get asserts latter in simplfy_rtx. The negation mode looks equally wrong. Am I mistaken? /* If we were trying to rotate, and that didn't work, try rotating the other direction before falling back to shifts and bitwise-or. */ if (((binoptab == rotl_optab && optab_handler (rotr_optab, mode)->insn_code != CODE_FOR_nothing) || (binoptab == rotr_optab && optab_handler (rotl_optab, mode)->insn_code != CODE_FOR_nothing)) && mclass == MODE_INT) { optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab); rtx newop1; unsigned int bits = GET_MODE_BITSIZE (mode); if (CONST_INT_P (op1)) newop1 = GEN_INT (bits - INTVAL (op1)); else if (targetm.shift_truncation_mask (mode) == bits - 1) newop1 = negate_rtx (mode, op1); else newop1 = expand_binop (mode, sub_optab, GEN_INT (bits), op1, NULL_RTX, unsignedp, OPTAB_DIRECT);
Re: Bug in binop rotate ?
On Sat, Oct 17, 2009 at 3:47 PM, Andrew Hutchinson wrote: > I have been adding rotate capability to AVR port and have come across what I > think is bug in > optabs.c: expand_binop() > > This occurs during a rotate expansion. For example > > target = op0 rotated by op1 > > In the particular situation (code extract below) it tries a reverse rotate > of (bits - op1). Where this expression is expanded as a simple integer, > a negation or subtraction depending on type of op1 and target. > > The expansion of the subtraction is using the mode of the target - I believe > it should be using the mode of op1. > The mode of the rotation amount need not be the same as the target. > > target:DI = Op0:DI rotate op1:HI > > In my testcase it is not and I get asserts latter in simplfy_rtx. > > The negation mode looks equally wrong. > > Am I mistaken? I think you are correct. Richard. > > /* If we were trying to rotate, and that didn't work, try rotating > the other direction before falling back to shifts and bitwise-or. */ > if (((binoptab == rotl_optab > && optab_handler (rotr_optab, mode)->insn_code != CODE_FOR_nothing) > || (binoptab == rotr_optab > && optab_handler (rotl_optab, mode)->insn_code != CODE_FOR_nothing)) > && mclass == MODE_INT) > { > optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab); > rtx newop1; > unsigned int bits = GET_MODE_BITSIZE (mode); > > if (CONST_INT_P (op1)) > newop1 = GEN_INT (bits - INTVAL (op1)); > else if (targetm.shift_truncation_mask (mode) == bits - 1) > newop1 = negate_rtx (mode, op1); > else > newop1 = expand_binop (mode, sub_optab, > GEN_INT (bits), op1, > NULL_RTX, unsignedp, OPTAB_DIRECT); >
Re: Bug in binop rotate ?
Thanks for your review. I have submitted bug report. Richard Guenther wrote: On Sat, Oct 17, 2009 at 3:47 PM, Andrew Hutchinson wrote: I have been adding rotate capability to AVR port and have come across what I think is bug in optabs.c: expand_binop() This occurs during a rotate expansion. For example target = op0 rotated by op1 In the particular situation (code extract below) it tries a reverse rotate of (bits - op1). Where this expression is expanded as a simple integer, a negation or subtraction depending on type of op1 and target. The expansion of the subtraction is using the mode of the target - I believe it should be using the mode of op1. The mode of the rotation amount need not be the same as the target. target:DI = Op0:DI rotate op1:HI In my testcase it is not and I get asserts latter in simplfy_rtx. The negation mode looks equally wrong. Am I mistaken? I think you are correct. Richard. /* If we were trying to rotate, and that didn't work, try rotating the other direction before falling back to shifts and bitwise-or. */ if (((binoptab == rotl_optab && optab_handler (rotr_optab, mode)->insn_code != CODE_FOR_nothing) || (binoptab == rotr_optab && optab_handler (rotl_optab, mode)->insn_code != CODE_FOR_nothing)) && mclass == MODE_INT) { optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab); rtx newop1; unsigned int bits = GET_MODE_BITSIZE (mode); if (CONST_INT_P (op1)) newop1 = GEN_INT (bits - INTVAL (op1)); else if (targetm.shift_truncation_mask (mode) == bits - 1) newop1 = negate_rtx (mode, op1); else newop1 = expand_binop (mode, sub_optab, GEN_INT (bits), op1, NULL_RTX, unsignedp, OPTAB_DIRECT);
Re: Constraint modifier for partially overlaping operands
Ian Lance Taylor wrote: > Andrew Hutchinson writes: > >> I can use "=" modifier to make operands use same register and early >> clobber "&" to avoid overlaps. >> >> Is it possible to have or construct a contraint that permits partial >> overlap operands. (which neither = or & would allow) >> The case would be wide types taking multiple hard registers. >> >> eg Input r20..23 Output r22..25 > > There is no such constraint today. I suppose it would be possible to > define such a constraint if it seemed useful. Maybe I'm misunderstanding, but I thought that was already the default if you use neither "=" to specify full overlap nor "&" for no overlap? Frex, a lot of ABIs specify that DImode values stored in pairs of SImode registers must always use an odd-even register pair (using a test in HARD_REGNO_MODE_OK), but when I was working on a custom port that allowed them in any register pair, GCC would happily generate partially overlapping movdi instructions such as (set:DI (reg:DI 5) (reg:DI 6)) (i.e., move r6/7 -> r5/6). This hasn't changed since 3.3, has it? cheers, DaveK
Re: Constraint modifier for partially overlaping operands
The situation comes up where no or a partial overlap of registers permits optimal code - since this can avoid using scratch register Thus no overlap OR partial overlap is preferred (or required) Using nothing leaves overlap without preference - full, partial,none Using = gives the least preffered case - full Using & gives only the no-overlapping case - none Ideally "NOT"= is required - which I would hope the register allocator would quite like too. Dave Korn wrote: Ian Lance Taylor wrote: Andrew Hutchinson writes: I can use "=" modifier to make operands use same register and early clobber "&" to avoid overlaps. Is it possible to have or construct a contraint that permits partial overlap operands. (which neither = or & would allow) The case would be wide types taking multiple hard registers. eg Input r20..23 Output r22..25 There is no such constraint today. I suppose it would be possible to define such a constraint if it seemed useful. Maybe I'm misunderstanding, but I thought that was already the default if you use neither "=" to specify full overlap nor "&" for no overlap? Frex, a lot of ABIs specify that DImode values stored in pairs of SImode registers must always use an odd-even register pair (using a test in HARD_REGNO_MODE_OK), but when I was working on a custom port that allowed them in any register pair, GCC would happily generate partially overlapping movdi instructions such as (set:DI (reg:DI 5) (reg:DI 6)) (i.e., move r6/7 -> r5/6). This hasn't changed since 3.3, has it? cheers, DaveK
Re: Constraint modifier for partially overlaping operands
On 10/16/2009 11:04 PM, Ian Lance Taylor wrote: Andrew Hutchinson writes: I can use "=" modifier to make operands use same register and early clobber "&" to avoid overlaps. Is it possible to have or construct a contraint that permits partial overlap operands. (which neither = or& would allow) The case would be wide types taking multiple hard registers. eg Input r20..23 Output r22..25 There is no such constraint today. I suppose it would be possible to define such a constraint if it seemed useful. I'd much prefer if the port decomposed its double word operations and used the lower-subreg pass to decompose the double word registers. At which point the register allocator has all of the information it needs to do the right thing. r~
Re: Constraint modifier for partially overlaping operands
Andrew Hutchinson wrote: > OR partial overlap is preferred (or required) Oh, I see what you mean. Yes, that probably would be useful, thanks for the clarification. cheers, DaveK
Re: Constraint modifier for partially overlaping operands
Yes. But we need to lower after combine and before register allocation. I'm still figuring out how to do that. Lowering before combine - in particular causes a lot of code bloat. This loose all optimization of conditional jumps, shifts etc. In our case, most lowering is delayed until after reload. This retains the RTL optimization but is suboptimal for allocation and lacks enough forward propagation. For a similar reason, not splitting wide types often produces far better code. One exception is DImode which by default is lowered at expand -since there are no DImode instructions defined. This ends up with pretty dire code since the built in expansion cant use a "carry" based pattern and we again miss the RTL optimization at the wider level. It would seem we need to have target dependent pass order to improve on this significantly. Andy Richard Henderson wrote: On 10/16/2009 11:04 PM, Ian Lance Taylor wrote: Andrew Hutchinson writes: I can use "=" modifier to make operands use same register and early clobber "&" to avoid overlaps. Is it possible to have or construct a contraint that permits partial overlap operands. (which neither = or& would allow) The case would be wide types taking multiple hard registers. eg Input r20..23 Output r22..25 There is no such constraint today. I suppose it would be possible to define such a constraint if it seemed useful. I'd much prefer if the port decomposed its double word operations and used the lower-subreg pass to decompose the double word registers. At which point the register allocator has all of the information it needs to do the right thing. r~
__attribute__((optimize)) and fast-math related oddities
Hang on while i put on my flame-proof suit. There. Merrily trying to make a test-case showing how unmanageable it is to try to override *math* flags per function, i soon had to stop because... $ cat amusing.cc #include static __attribute__((optimize("-fno-associative-math"))) double foo1(double x) { return (x + pow(2, 52)) - pow(2, 52); } static __attribute__((noinline)) double bar1(double x) { return foo1(x); } #ifdef HUH static __attribute__((optimize("-fno-associative-math"))) double foo2(double x) { return (x + pow(2, 52)) - pow(2, 52); } static __attribute__((noinline, optimize("-ffast-math"))) double bar2(double x) { return foo2(x); } #endif int main() { double x = 1.1; if (bar1(x) == x) return 1; #ifdef HUH if (bar2(x) == x) return 2; #endif return 0; } $ g++-4.4 -O2 amusing.cc -ffast-math && ./a.out; echo $? 0 $ g++-4.4 -O2 amusing.cc -ffast-math -DHUH && ./a.out; echo $? 1 $ g++-4.4 -O2 amusing.cc && ./a.out; echo $? 0 $ g++-4.4 -O2 amusing.cc -DHUH && ./a.out; echo $? 1 ... made even less sense than expected. I got that like for other 'incompatible' flags, conflicting math flags should prevent inlining, only they don't. And it's all weird. But that one takes the cake. Could someone tell me what the fuss is about? $ g++-4.4 -v Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.1-6' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-objc-gc --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.4.1 (Debian 4.4.1-6)
GC overhead just because of aligning long double?
We're waisting 8 bytes for every gimple_seq_node_d on x86_64 just because we might be allocating a structure with a long double element (16 byte aligned). I grepped and didn't find traces of such a use, so - can we just document that callers need to round up allocation sizes to multiples of the required alignment (and just enforce that for alignment requirements bigger than pointers)? A gimple_seq_node_d should be allocated from 24 byte page sizes but are allocated from 32 byte page sizes because of the above issue - wasting 25%. Thanks, Richard.
Re: GC overhead just because of aligning long double?
On Sun, 18 Oct 2009, Richard Guenther wrote: > > We're waisting 8 bytes for every gimple_seq_node_d on x86_64 just > because we might be allocating a structure with a long double > element (16 byte aligned). I grepped and didn't find traces of > such a use, so - can we just document that callers need to > round up allocation sizes to multiples of the required alignment > (and just enforce that for alignment requirements bigger than pointers)? > > A gimple_seq_node_d should be allocated from 24 byte page sizes > but are allocated from 32 byte page sizes because of the above > issue - wasting 25%. Changing MAX_ALIGNMENT from 16 to 8 on x86_64 makes us go from Total2788975191 775857920 40632633400225192 55862833 source location Garbage Freed Leak OverheadTimes to Total2262618351 747470600 38848265122407808 55849810 source location Garbage Freed Leak OverheadTimes for SPEC 2006 tonto. I added gimple_seq_node_d, 40, 56, 72, 88, 104 and 120 as extra sizes of which Total Allocated page size 24: 589531776 Total Allocated page size 56: 213088232 Total Allocated page size 72: 232481808 have a relevant amount of allocation. In the past the issue was that gc allocs are not type-safe, so we sometimes allocate sizes that are not multiples of the required alignment, like for struct { int len; char c[1]; }. None of the existing uses should require larger than pointer alignment though, so documenting that should be enough. Richard.