On Thu, Jan 17, 2013 at 12:20 PM, Georg-Johann Lay <a...@gjlay.de> wrote: > Hi, suppose the following C code: > > > static __inline__ __attribute__((__always_inline__)) > _Fract rbits (const int i) > { > _Fract f; > __builtin_memcpy (&f, &i, sizeof (_Fract)); > return f; > } > > _Fract func (void) > { > #if B == 1 > return rbits (0x1234); > #elif B == 2 > return 0.14222r; > #endif > } > > > Type-punning idioms like in rbits above are very common in libgcc, for example > in fixed-bit.c. > > In this example, both compilation variants are equivalent (provided int and > _Fract are 16 bits wide). The problem with the B=1 variant is that it is > inefficient: > > Variant B=1 needs 2 instructions.
B == 1 shows that fold-const.c native_interpret/encode_expr lack support for FIXED_POINT_TYPE. > Variant B=2 needs 11 instructions, 9 of them are not needed at all. > > The problem goes as follows: > > The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to > memory > moves through the frame: > > (insn 5 4 6 (set (reg:HI 45) > (const_int 4660 [0x1234])) bloat.c:5 -1 > (nil)) > > (insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8]) > (reg:HI 45)) bloat.c:5 -1 > (nil)) > > (insn 7 6 8 (set (reg:HQ 46) > (mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1 > (nil)) > > (insn 8 7 9 (set (reg:HQ 43 [ <retval> ]) > (reg:HQ 46)) bloat.c:12 -1 > (nil)) > > > Is there a specific reason why this is not expanded as subreg like this? > > > (set (reg:HQ 46) > (subreg:HQ [(reg:HI 45)] 0)) Probably your target does not allow this? Not sure, but then a testcase like static __inline__ __attribute__((__always_inline__)) _Fract rbits (const int i) { _Fract f; __builtin_memcpy (&f, &i, sizeof (_Fract)); return f; } _Fract func (int i) { return rbits (i); } would be more interesting, as your testcase boils down to a constant folding issue (see above). > The insns are analyzed in .dfinit: > > ;; regs ever live 24[r24] 25[r25] 29[r29] > > 24/25 is the return register, 28/29 is the frame pointer: > > > (insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28) > (const_int 1 [0x1])) [2 S2 A8]) > (reg:HI 45)) bloat.c:5 82 {*movhi} > > > Is there a reason why R28 is not marked as live? > > > The memory accesses are optimized out in .fwprop2 so that no frame pointer is > needed any more. > > However, in the subsequent passes, R29 is still reported as "regs ever live" > which is not correct. > > The "regs ever live" don't change until .ira, where R28 springs to live again: > > ;; regs ever live 24[r24] 25[r25] 28[r28] 29[r29] > > And in .reload: > > ;; regular block artificial uses 28 [r28] 32 [__SP_L__] > ;; eh block artificial uses 28 [r28] 32 [__SP_L__] 34 [argL] > ;; entry block defs 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14 > [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22] > 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] > ;; exit block uses 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] > ;; regs ever live 24[r24] 25[r25] 28[r28] 29[r29] > > Outcome is that the frame pointer is set up without need, which is very costly > on avr. > > > The compiler is trunk from 2013-01-16 configured for avr: > > --target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2 > > The example is compiled with > > avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1 > > > In what stage happens the misoptimization? > > Is this worth a PR? (I.e. is there a change that anybody cares?) Well, AVR folks may be interested, or people caring for fixed-point support. Richard. > Thanks.