On Thu, Jan 17, 2013 at 6:04 PM, Georg-Johann Lay <a...@gjlay.de> wrote: > Richard Biener wrote: >> On Thu, Jan 17, 2013 at 12:20 PM, Georg-Johann Lay wrote: >>> Hi, suppose the following C code: >>> >>> >>> static __inline__ __attribute__((__always_inline__)) >>> _Fract rbits (const int i) >>> { >>> _Fract f; >>> __builtin_memcpy (&f, &i, sizeof (_Fract)); >>> return f; >>> } >>> >>> _Fract func (void) >>> { >>> #if B == 1 >>> return rbits (0x1234); >>> #elif B == 2 >>> return 0.14222r; >>> #endif >>> } >>> >>> >>> Type-punning idioms like in rbits above are very common in libgcc, for >>> example >>> in fixed-bit.c. >>> >>> In this example, both compilation variants are equivalent (provided int and >>> _Fract are 16 bits wide). The problem with the B=1 variant is that it is >>> inefficient: >>> >>> Variant B=1 needs 2 instructions. >> >> B == 1 shows that fold-const.c native_interpret/encode_expr lack >> support for FIXED_POINT_TYPE. >> >>> Variant B=2 needs 11 instructions, 9 of them are not needed at all. > > I confused B=1 and B=2. The inefficient case with 11 instructions is B=1, of > course. > > Would a patch like below be acceptable in the current stage?
I'd be fine with it (not sure about test coverage). But please also add native_encode_fixed. > It's only the native_interpret and pretty much like the int case. > > Difference is that it rejects if the sizes don't match exactly. Hmm, yeah. I'm not sure why the _interpret routines chose to ignore tail padding ... was there any special correctness reason you did it differently than the int variant? > A new function > is used for low-level construction of a const_fixed_from_double_int. Isn't > there a better way? I wonder that such functionality is not already there... Good question - there are probably more places that could make use of this. > A quick test shows that it works reasonable on AVR. > >>> The problem goes as follows: >>> >>> The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to >>> memory >>> moves through the frame: >>> >>> (insn 5 4 6 (set (reg:HI 45) >>> (const_int 4660 [0x1234])) bloat.c:5 -1 >>> (nil)) >>> >>> (insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8]) >>> (reg:HI 45)) bloat.c:5 -1 >>> (nil)) >>> >>> (insn 7 6 8 (set (reg:HQ 46) >>> (mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1 >>> (nil)) >>> >>> (insn 8 7 9 (set (reg:HQ 43 [ <retval> ]) >>> (reg:HQ 46)) bloat.c:12 -1 >>> (nil)) >>> >>> >>> Is there a specific reason why this is not expanded as subreg like this? >>> >>> >>> (set (reg:HQ 46) >>> (subreg:HQ [(reg:HI 45)] 0)) >> >> Probably your target does not allow this? Not sure, but then a testcase like > > This (movhq) is supported, similar for other fixed-point modes moves: > > QQ, UQQ, HQ, UHQ, HA, UHA, SQ, USQ, SA, USA, DQ, UDQ, DA, UDA, TA, UTA. > > >> static __inline__ __attribute__((__always_inline__)) >> _Fract rbits (const int i) >> { >> _Fract f; >> __builtin_memcpy (&f, &i, sizeof (_Fract)); >> return f; >> } >> >> _Fract func (int i) >> { >> return rbits (i); >> } >> >> would be more interesting, as your testcase boils down to a constant >> folding issue (see above). > > This case is optimized fine and the generated code is nice. I see - so it's really confused about the constant-ness. Yeah, I can imagine that. Thanks, Richard. >>> The insns are analyzed in .dfinit: >>> >>> ;; regs ever live 24[r24] 25[r25] 29[r29] >>> >>> 24/25 is the return register, 28/29 is the frame pointer: >>> >>> >>> (insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28) >>> (const_int 1 [0x1])) [2 S2 A8]) >>> (reg:HI 45)) bloat.c:5 82 {*movhi} >>> >>> >>> Is there a reason why R28 is not marked as live? >>> >>> >>> The memory accesses are optimized out in .fwprop2 so that no frame pointer >>> is >>> needed any more. >>> >>> However, in the subsequent passes, R29 is still reported as "regs ever live" >>> which is not correct. >>> >>> The "regs ever live" don't change until .ira, where R28 springs to live >>> again: >>> >>> ;; regs ever live 24[r24] 25[r25] 28[r28] 29[r29] >>> >>> And in .reload: >>> >>> ;; regular block artificial uses 28 [r28] 32 [__SP_L__] >>> ;; eh block artificial uses 28 [r28] 32 [__SP_L__] 34 [argL] >>> ;; entry block defs 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] >>> 14 >>> [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 >>> [r22] >>> 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] >>> ;; exit block uses 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] >>> ;; regs ever live 24[r24] 25[r25] 28[r28] 29[r29] >>> >>> Outcome is that the frame pointer is set up without need, which is very >>> costly >>> on avr. >>> >>> >>> The compiler is trunk from 2013-01-16 configured for avr: >>> >>> --target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2 >>> >>> The example is compiled with >>> >>> avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1 >>> >>> >>> In what stage happens the misoptimization? >>> >>> Is this worth a PR? (I.e. is there a change that anybody cares?) >> >> Well, AVR folks may be interested, or people caring for fixed-point support. > > Besides the possible optimization in fold-const.c, data flow produces strange > results. > > Presumably it misses some properties of the frame pointer which spans two hard > registers? This is rather uncommon in GCC as far as I know. > > There were attempts to use HARD_FRAME_POINTER in the avr BE but the results > were completely unusable because a frame was set up in any function :-( > > > Johann >