Re: Imprecise data flow analysis leads to code bloat

Richard Biener Thu, 17 Jan 2013 03:39:47 -0800

On Thu, Jan 17, 2013 at 12:20 PM, Georg-Johann Lay <a...@gjlay.de> wrote:
> Hi, suppose the following C code:
>
>
> static __inline__ __attribute__((__always_inline__))
> _Fract rbits (const int i)
> {
>     _Fract f;
>     __builtin_memcpy (&f, &i, sizeof (_Fract));
>     return f;
> }
>
> _Fract func (void)
> {
> #if B == 1
>     return rbits (0x1234);
> #elif B == 2
>     return 0.14222r;
> #endif
> }
>
>
> Type-punning idioms like in rbits above are very common in libgcc, for example
> in fixed-bit.c.
>
> In this example, both compilation variants are equivalent (provided int and
> _Fract are 16 bits wide).  The problem with the B=1 variant is that it is
> inefficient:
>
> Variant B=1 needs 2 instructions.


B == 1 shows that fold-const.c native_interpret/encode_expr lack
support for FIXED_POINT_TYPE.

> Variant B=2 needs 11 instructions, 9 of them are not needed at all.
>
> The problem goes as follows:
>
> The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to 
> memory
> moves through the frame:
>
> (insn 5 4 6 (set (reg:HI 45)
>         (const_int 4660 [0x1234])) bloat.c:5 -1
>      (nil))
>
> (insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])
>         (reg:HI 45)) bloat.c:5 -1
>      (nil))
>
> (insn 7 6 8 (set (reg:HQ 46)
>         (mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1
>      (nil))
>
> (insn 8 7 9 (set (reg:HQ 43 [ <retval> ])
>         (reg:HQ 46)) bloat.c:12 -1
>      (nil))
>
>
> Is there a specific reason why this is not expanded as subreg like this?
>
>
>   (set (reg:HQ 46)
>        (subreg:HQ [(reg:HI 45)] 0))

Probably your target does not allow this?  Not sure, but then a testcase like

static __inline__ __attribute__((__always_inline__))
_Fract rbits (const int i)
{
  _Fract f;
  __builtin_memcpy (&f, &i, sizeof (_Fract));
  return f;
}

_Fract func (int i)
{
  return rbits (i);
}

would be more interesting, as your testcase boils down to a constant
folding issue (see above).

> The insns are analyzed in .dfinit:
>
> ;;  regs ever live       24[r24] 25[r25] 29[r29]
>
> 24/25 is the return register, 28/29 is the frame pointer:
>
>
> (insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28)
>                 (const_int 1 [0x1])) [2 S2 A8])
>         (reg:HI 45)) bloat.c:5 82 {*movhi}
>
>
> Is there a reason why R28 is not marked as live?
>
>
> The memory accesses are optimized out in .fwprop2 so that no frame pointer is
> needed any more.
>
> However, in the subsequent passes, R29 is still reported as "regs ever live"
> which is not correct.
>
> The "regs ever live" don't change until .ira, where R28 springs to live again:
>
> ;;  regs ever live       24[r24] 25[r25] 28[r28] 29[r29]
>
> And in .reload:
>
> ;;  regular block artificial uses        28 [r28] 32 [__SP_L__]
> ;;  eh block artificial uses     28 [r28] 32 [__SP_L__] 34 [argL]
> ;;  entry block defs     8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14
> [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22]
> 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
> ;;  exit block uses      24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
> ;;  regs ever live       24[r24] 25[r25] 28[r28] 29[r29]
>
> Outcome is that the frame pointer is set up without need, which is very costly
> on avr.
>
>
> The compiler is trunk from 2013-01-16 configured for avr:
>
>    --target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2
>
> The example is compiled with
>
>    avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1
>
>
> In what stage happens the misoptimization?
>
> Is this worth a PR? (I.e. is there a change that anybody cares?)

Well, AVR folks may be interested, or people caring for fixed-point support.

Richard.

> Thanks.

Re: Imprecise data flow analysis leads to code bloat

Reply via email to