Richard Biener wrote: > On Thu, Jan 17, 2013 at 12:20 PM, Georg-Johann Lay wrote: >> Hi, suppose the following C code: >> >> >> static __inline__ __attribute__((__always_inline__)) >> _Fract rbits (const int i) >> { >> _Fract f; >> __builtin_memcpy (&f, &i, sizeof (_Fract)); >> return f; >> } >> >> _Fract func (void) >> { >> #if B == 1 >> return rbits (0x1234); >> #elif B == 2 >> return 0.14222r; >> #endif >> } >> >> >> Type-punning idioms like in rbits above are very common in libgcc, for >> example >> in fixed-bit.c. >> >> In this example, both compilation variants are equivalent (provided int and >> _Fract are 16 bits wide). The problem with the B=1 variant is that it is >> inefficient: >> >> Variant B=1 needs 2 instructions. > > B == 1 shows that fold-const.c native_interpret/encode_expr lack > support for FIXED_POINT_TYPE. > >> Variant B=2 needs 11 instructions, 9 of them are not needed at all.
I confused B=1 and B=2. The inefficient case with 11 instructions is B=1, of course. Would a patch like below be acceptable in the current stage? It's only the native_interpret and pretty much like the int case. Difference is that it rejects if the sizes don't match exactly. A new function is used for low-level construction of a const_fixed_from_double_int. Isn't there a better way? I wonder that such functionality is not already there... A quick test shows that it works reasonable on AVR. >> The problem goes as follows: >> >> The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to >> memory >> moves through the frame: >> >> (insn 5 4 6 (set (reg:HI 45) >> (const_int 4660 [0x1234])) bloat.c:5 -1 >> (nil)) >> >> (insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8]) >> (reg:HI 45)) bloat.c:5 -1 >> (nil)) >> >> (insn 7 6 8 (set (reg:HQ 46) >> (mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1 >> (nil)) >> >> (insn 8 7 9 (set (reg:HQ 43 [ <retval> ]) >> (reg:HQ 46)) bloat.c:12 -1 >> (nil)) >> >> >> Is there a specific reason why this is not expanded as subreg like this? >> >> >> (set (reg:HQ 46) >> (subreg:HQ [(reg:HI 45)] 0)) > > Probably your target does not allow this? Not sure, but then a testcase like This (movhq) is supported, similar for other fixed-point modes moves: QQ, UQQ, HQ, UHQ, HA, UHA, SQ, USQ, SA, USA, DQ, UDQ, DA, UDA, TA, UTA. > static __inline__ __attribute__((__always_inline__)) > _Fract rbits (const int i) > { > _Fract f; > __builtin_memcpy (&f, &i, sizeof (_Fract)); > return f; > } > > _Fract func (int i) > { > return rbits (i); > } > > would be more interesting, as your testcase boils down to a constant > folding issue (see above). This case is optimized fine and the generated code is nice. >> The insns are analyzed in .dfinit: >> >> ;; regs ever live 24[r24] 25[r25] 29[r29] >> >> 24/25 is the return register, 28/29 is the frame pointer: >> >> >> (insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28) >> (const_int 1 [0x1])) [2 S2 A8]) >> (reg:HI 45)) bloat.c:5 82 {*movhi} >> >> >> Is there a reason why R28 is not marked as live? >> >> >> The memory accesses are optimized out in .fwprop2 so that no frame pointer is >> needed any more. >> >> However, in the subsequent passes, R29 is still reported as "regs ever live" >> which is not correct. >> >> The "regs ever live" don't change until .ira, where R28 springs to live >> again: >> >> ;; regs ever live 24[r24] 25[r25] 28[r28] 29[r29] >> >> And in .reload: >> >> ;; regular block artificial uses 28 [r28] 32 [__SP_L__] >> ;; eh block artificial uses 28 [r28] 32 [__SP_L__] 34 [argL] >> ;; entry block defs 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14 >> [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22] >> 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] >> ;; exit block uses 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] >> ;; regs ever live 24[r24] 25[r25] 28[r28] 29[r29] >> >> Outcome is that the frame pointer is set up without need, which is very >> costly >> on avr. >> >> >> The compiler is trunk from 2013-01-16 configured for avr: >> >> --target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2 >> >> The example is compiled with >> >> avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1 >> >> >> In what stage happens the misoptimization? >> >> Is this worth a PR? (I.e. is there a change that anybody cares?) > > Well, AVR folks may be interested, or people caring for fixed-point support. Besides the possible optimization in fold-const.c, data flow produces strange results. Presumably it misses some properties of the frame pointer which spans two hard registers? This is rather uncommon in GCC as far as I know. There were attempts to use HARD_FRAME_POINTER in the avr BE but the results were completely unusable because a frame was set up in any function :-( Johann
Index: fold-const.c =================================================================== --- fold-const.c (revision 195244) +++ fold-const.c (working copy) @@ -7410,6 +7410,58 @@ native_interpret_int (tree type, const u /* Subroutine of native_interpret_expr. Interpret the contents of + the buffer PTR of length LEN as a FIXED_CST of type TYPE. + If the buffer cannot be interpreted, return NULL_TREE. */ + +static tree +native_interpret_fixed (tree type, const unsigned char *ptr, int len) +{ + int total_bytes = GET_MODE_SIZE (TYPE_MODE (type)); + int byte, offset, word, words; + unsigned char value; + double_int result; + FIXED_VALUE_TYPE fixed_value; + + if (total_bytes != len) + return NULL_TREE; + if (total_bytes * BITS_PER_UNIT > HOST_BITS_PER_DOUBLE_INT) + return NULL_TREE; + + result = double_int_zero; + words = total_bytes / UNITS_PER_WORD; + + for (byte = 0; byte < total_bytes; byte++) + { + int bitpos = byte * BITS_PER_UNIT; + if (total_bytes > UNITS_PER_WORD) + { + word = byte / UNITS_PER_WORD; + if (WORDS_BIG_ENDIAN) + word = (words - 1) - word; + offset = word * UNITS_PER_WORD; + if (BYTES_BIG_ENDIAN) + offset += (UNITS_PER_WORD - 1) - (byte % UNITS_PER_WORD); + else + offset += byte % UNITS_PER_WORD; + } + else + offset = BYTES_BIG_ENDIAN ? (total_bytes - 1) - byte : byte; + value = ptr[offset]; + + if (bitpos < HOST_BITS_PER_WIDE_INT) + result.low |= (unsigned HOST_WIDE_INT) value << bitpos; + else + result.high |= (unsigned HOST_WIDE_INT) value + << (bitpos - HOST_BITS_PER_WIDE_INT); + } + + fixed_value = const_fixed_from_double_int (result, TYPE_MODE (type)); + + return build_fixed (type, fixed_value); +} + + +/* Subroutine of native_interpret_expr. Interpret the contents of the buffer PTR of length LEN as a REAL_CST of type TYPE. If the buffer cannot be interpreted, return NULL_TREE. */ @@ -7533,6 +7585,9 @@ native_interpret_expr (tree type, const case REAL_TYPE: return native_interpret_real (type, ptr, len); + case FIXED_POINT_TYPE: + return native_interpret_fixed (type, ptr, len); + case COMPLEX_TYPE: return native_interpret_complex (type, ptr, len); @@ -7557,6 +7612,7 @@ can_native_interpret_type_p (tree type) case BOOLEAN_TYPE: case POINTER_TYPE: case REFERENCE_TYPE: + case FIXED_POINT_TYPE: case REAL_TYPE: case COMPLEX_TYPE: case VECTOR_TYPE: Index: fixed-value.c =================================================================== --- fixed-value.c (revision 195244) +++ fixed-value.c (working copy) @@ -81,6 +81,24 @@ check_real_for_fixed_mode (REAL_VALUE_TY return FIXED_OK; } + +/* Construct a CONST_FIXED from a bit payload and machine mode MODE. + The bits in PAYLOAD are used verbatim. */ + +FIXED_VALUE_TYPE +const_fixed_from_double_int (double_int payload, enum machine_mode mode) +{ + FIXED_VALUE_TYPE value; + + gcc_assert (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_DOUBLE_INT); + + value.data = payload; + value.mode = mode; + + return value; +} + + /* Initialize from a decimal or hexadecimal string. */ void Index: fixed-value.h =================================================================== --- fixed-value.h (revision 195244) +++ fixed-value.h (working copy) @@ -49,6 +49,11 @@ extern FIXED_VALUE_TYPE fconst1[MAX_FCON const_fixed_from_fixed_value (r, m) extern rtx const_fixed_from_fixed_value (FIXED_VALUE_TYPE, enum machine_mode); +/* Construct a CONST_FIXED from a bit payload and machine mode MODE. + The bits in PAYLOAD are used verbatim. */ +extern FIXED_VALUE_TYPE const_fixed_from_double_int (double_int, + enum machine_mode); + /* Initialize from a decimal or hexadecimal string. */ extern void fixed_from_string (FIXED_VALUE_TYPE *, const char *, enum machine_mode);