Re: Variable DECL_SIZE
On Wed, Oct 3, 2012 at 7:05 PM, Ilya Enkovich wrote: > Hi, > > I fall into ssa verification failure when try to pass field's > DECL_SIZE as an operand for CALL_EXPR. The fail occurs if field's size > is not a constant. In such case DECL_SIZE holds a VAR_DECL and I need > to find it's proper SSA_NAME. I thought it may be the default one but > gimple_default_def returns NULL. What is the right way to obtain > correct ssa_name in such case? There is no way. You have to know that you need it's size at the point of gimplification. Later there is no way to recover it. Richard. > > Thanks, > Ilya
Re: Functions that are CSEable but not pure
On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill wrote: > In C++ there is a common idiom called "initialize on first use". In its > simplest form it looks like > > int& lazy_i() > { > static int i = init; > return i; > } > > If the initialization is expensive or order-sensitive, this is a useful > alternative to initialization on load > (http://www.parashift.com/c++-faq/static-init-order-on-first-use.html). > > An interesting property of such functions is that they only have > side-effects the first time they are called, so subsequent calls can be > optimized away to just use the return value of the first call. If the result is not needed, are we allowed to remove a call to this function? Basically, is the function allowed to "detect" if it was called before? If we can remove it, thus the function may not exploit knowledge on wheter it was called before then we can also perform partial redundancy elimination in if () lazy_i (); lazy_i (); so I really hope we can also DCE these calls - otherwise their property would be really weird, as CSE also involves a "DCE" piece: i = lazy_i (); j = lazy_j (); -> i = lazy_i (); lazy_j (); j = i; -> i = lazy_i (); j = i; But if we are allowed to DCE a lazy_i call with unused result then it really is "pure". Basically with this property we consider the static variable it writes to as local, as it cannot be inspected in any other way than by calling the function or inspecting its return value. So - what's wrong with using pure? That it doesn't "feel" correct? The I suppose we should simply re-word its definition ;) > Currently there is no way to express this in GCC so that the optimizers know > that multiple calls are redundant. Marking the function as pure causes > calls to be CSEd as desired, but also asserts that the first call has no > side-effects, which is not the case. > > My implementation of dynamic initialization of TLS variables as mandated by > the C++11 and OpenMP standards uses this idiom implicitly, so I'd really > like to be able to optimize away multiple calls. > > Right now we have the following ECF flags (among others): > > const = no read, no write, no side-effects, cseable > pure = no write, no side-effects, cseable > novops = no read, no write note that I consider novops a red-herring. novops is basically 'const' volatile, but we don't have a volatile flag on calls (we have it, but that's for the store to the result). I'd rather get rid of novops ... > looping_const_or_pure = wait, actually there are side-effects > > Seems like the difference between novops and looping_const_or_pure is > whether calls are subject to CSE. Is that right? No, novops and looping_const_or_pure tell you whether they are subject to DCE, all const/pure functions are subject to CSE, novops functions are not (as said above, novops is really a volatile const). > Rather than checking these flag bundles, it seems to me that we ought to > break them up into > > no reads > no writes > no side-effects > cseable side-effects no side-effects and cseable side-effects doesn't sound like a good distinction to me - side-effects are not 'cse'-able or they would not be side-effects. But I suppose it all boils down to my question above, where the PRE case is really the most interesting (can the formerly second call "detect" it used the initialized path or the non-initialized path?) > So const would be noread|nowrite|noside >pure would be nowrite|noside > const|looping would be noread|nowrite|cseside > pure|looping would be nowrite|cseside >novops would be noread|nowrite > > and the behavior I want would be just cseside. > > Does this make sense? Anyone more familiar with this area of the code want > to implement it for me? :} I'd say it's already implemented, just poorly documented. Use pure! Richard. > Jason
[AVR] Missing avr51-flash1.x in avr target specific tests?
Some tests in gcc/testsuite/gcc.target/avr/torture (builtins-2.c, for e.g.) have -Tavr51-flash1.x specified in dg-options. The tests currently fail with an unable to open linker script error for that file. Is that linker script supposed to be checked into source control? Or am I missing some configure/build option? Regards Senthil
Re: reverse bitfield patch
On Wed, Oct 3, 2012 at 12:07 AM, DJ Delorie wrote: > > Here's my current patch for the bitfield reversal feature I've been > working on for a while, with an RX-specific pragma to apply it > "globally". Could someone please review this? It would be nice > to get it in before stage1 closes again... ChangeLog missing, new functions need a toplevel comment documenting function, argument and return value as per coding conventions. Richard. > > Index: gcc/doc/extend.texi > === > --- gcc/doc/extend.texi (revision 192009) > +++ gcc/doc/extend.texi (working copy) > @@ -5427,12 +5427,74 @@ Note that the type visibility is applied > associated with the class (vtable, typeinfo node, etc.). In > particular, if a class is thrown as an exception in one shared object > and caught in another, the class must have default visibility. > Otherwise the two shared objects will be unable to use the same > typeinfo node and exception handling will break. > > +@item bit_order > +Normally, GCC allocates bitfields from either the least significant or > +most significant bit in the underlying type, such that bitfields > +happen to be allocated from lowest address to highest address. > +Specifically, big-endian targets allocate the MSB first, where > +little-endian targets allocate the LSB first. The @code{bit_order} > +attribute overrides this default, allowing you to force allocation to > +be MSB-first, LSB-first, or the opposite of whatever gcc defaults to. The > +@code{bit_order} attribute takes an optional argument: > + > +@table @code > + > +@item native > +This is the default, and also the mode when no argument is given. GCC > +allocates LSB-first on little endian targets, and MSB-first on big > +endian targets. > + > +@item swapped > +Bitfield allocation is the opposite of @code{native}. > + > +@item lsb > +Bits are allocated LSB-first. > + > +@item msb > +Bits are allocated MSB-first. > + > +@end table > + > +A short example demonstrates bitfield allocation: > + > +@example > +struct __attribute__((bit_order(msb))) @{ > + char a:3; > + char b:3; > +@} foo = @{ 3, 5 @}; > +@end example > + > +With LSB-first allocation, @code{foo.a} will be in the 3 least > +significant bits (mask 0x07) and @code{foo.b} will be in the next 3 > +bits (mask 0x38). With MSB-first allocation, @code{foo.a} will be in > +the 3 most significant bits (mask 0xE0) and @code{foo.b} will be in > +the next 3 bits (mask 0x1C). > + > +Note that it is entirely up to the programmer to define bitfields that > +make sense when swapped. Consider: > + > +@example > +struct __attribute__((bit_order(msb))) @{ > + short a:7; > + char b:6; > +@} foo = @{ 3, 5 @}; > +@end example > + > +On some targets, or if the struct is @code{packed}, GCC may only use > +one byte of storage for A despite it being a @code{short} type. > +Swapping the bit order of A would cause it to overlap B. Worse, the > +bitfield for B may span bytes, so ``swapping'' would no longer be > +defined as there is no ``char'' to swap within. To avoid such > +problems, the programmer should either fully-define each underlying > +type, or ensure that their target's ABI allocates enough space for > +each underlying type regardless of how much of it is used. > + > @end table > > To specify multiple attributes, separate them by commas within the > double parentheses: for example, @samp{__attribute__ ((aligned (16), > packed))}. > > Index: gcc/c-family/c-common.c > === > --- gcc/c-family/c-common.c (revision 192009) > +++ gcc/c-family/c-common.c (working copy) > @@ -310,12 +310,13 @@ struct visibility_flags visibility_optio > > static tree c_fully_fold_internal (tree expr, bool, bool *, bool *); > static tree check_case_value (tree); > static bool check_case_bounds (tree, tree, tree *, tree *); > > static tree handle_packed_attribute (tree *, tree, tree, int, bool *); > +static tree handle_bitorder_attribute (tree *, tree, tree, int, bool *); > static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *); > static tree handle_common_attribute (tree *, tree, tree, int, bool *); > static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *); > static tree handle_hot_attribute (tree *, tree, tree, int, bool *); > static tree handle_cold_attribute (tree *, tree, tree, int, bool *); > static tree handle_noinline_attribute (tree *, tree, tree, int, bool *); > @@ -601,12 +602,14 @@ const unsigned int num_c_common_reswords > const struct attribute_spec c_common_attribute_table[] = > { >/* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, > affects_type_identity } */ >{ "packed", 0, 0, false, false, false, > handle_packed_attribute , false}, > + { "bit_order", 0, 1, false, true, false, > + handle_bitorder_a
Re: inlined memcpy/memset degradation in gcc 4.6 or later
On Tue, Oct 2, 2012 at 4:19 PM, Walter Lee wrote: > > On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in > gcc 4.6 and later versus gcc 4.4. Though I find the problem on > TILE-Gx, I think this is a problem for any architectures with > SLOW_UNALIGNED_ACCESS set to 1. > > Consider the following program: > > struct foo { > int x; > }; > > void copy(struct foo* f0, struct foo* f1) > { > memcpy (f0, f1, sizeof(struct foo)); > } > > In gcc 4.4, I get the desired inline memcpy: > > copy: > ld4sr1, r1 > st4 r0, r1 > jrp lr > > In gcc 4.7, however, I get inlined byte-by-byte copies: > > copy: > ld1u_add r10, r1, 1 > st1_add r0, r10, 1 > ld1u_add r10, r1, 1 > st1_add r0, r10, 1 > ld1u_add r10, r1, 1 > st1_add r0, r10, 1 > ld1u r10, r1 > st1 r0, r10 > jrp lr > > The inlining of memcpy is done in expand_builtin_memcpy in builtins.c. > Tracing through that, I see that the alignment of src_align and > dest_align, which is computed by get_pointer_alignment, has degraded: > in gcc 4.4 they are 32 bits, but in gcc 4.7 they are 8 bits. This > causes the loads generated by the inlined memcopy to be per-byte > instead of per-4-byte. > > Looking further, gcc 4.7 uses the "align" field in "struct > ptr_info_def" to compute the alignment. This field appears to be > initialized in get_ptr_info in tree-ssanames.c but it is always > initialized to 1 byte and does not appear to change. gcc 4.4 computes > its alignment information differently. > > I get the same byte-copies with gcc 4.8 and gcc 4.6. > > I see a couple related open PRs: 50417, 53535, but no suggested fixes > for them yet. Can anyone advise on how this can be fixed? Should I > file a new bug, or add this info to one of the existing PRs? There is no way to fix it. memcpy does not require aligned arguments and the merely presence of a typed pointer contains zero information of alignment for the middle-end. If you want to excercise C's notion of alignemnt requirements then do not use memcpy but *f0 = *f1; which works equally well. ISTR Tom had various patches improving alignment info for pointers but we cannot improve this degenerate case. Btw, the new beavior even fixed bugs. Richard. > Thanks, > > Walter >
Re: Functions that are CSEable but not pure
On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote: > On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill wrote: > If the result is not needed, are we allowed to remove a call to this > function? No. Unless you know the same function has been already called. > So - what's wrong with using pure? That it doesn't "feel" correct? No, it can modify memory. Consider: a.h: extern int g; struct S { S () { __atomic_add_fetch (&g, 1, __ATOMIC_RELAXED); s = 7; } int s; }; extern thread_local S s; liba.C: #include "a.h" thread_local S s; libb.C: #include "a.h" int g; void foo (S *&p, S *&q, int &a, int &b) { a = g; p = &s; // this runs S::S() in the current thread b = g; q = &s; } I think the plan was for these functions not to return any value, so that they could be aliases for all thread_local vars defined in the same TU. Then you really can't DCE the first call, you can the second call. If it returns void, CSEable is probably a bad word. Even if it returns value and the value is unused, it still needs to be called the first time in a function, and can modify anything the ctor(s) can modify. Jakub
[LRA] liveness update not done after elimination??
Hello Vlad, Is it intentional that DF_LR_IN and DF_LR_OUT are not updated after "Updating elimination of equiv for reg..."? I have some checking in place in process_bb_lives at the end of the function, and it triggers on the test case. (Checking code and test case is at the end of this e-mail.) It show: LIVEIN-mismatch in bb 5: 66, 69, 66, 69, 73 73 t.c: In function 'get_mem_attrs': t.c:47:1: internal compiler error: in process_bb_lives, at lra-lives.c:866 } ^ And that's correct if you look at the LR_IN set of basic block 5 in the IRA (.193r.ira) dump: ;; basic block 2, loop depth 0, count 0, freq 1, maybe hot ... 9 r73:SI=[argp:DI] REG_EQUIV: [argp:DI] ... ;; lr out 6 [bp] 7 [sp] 16 [argp] 20 [frame] 66 69 73 ... ;; basic block 5, loop depth 0, count 0, freq 5014, maybe hot ;; prev block 4, next block 6, flags: (RTL) ;; pred: 4 ;; bb 5 artificial_defs: { } ;; bb 5 artificial_uses: { u22(6){ }u23(7){ }u24(16){ }u25(20){ }} ;; lr in6 [bp] 7 [sp] 16 [argp] 20 [frame] 66 69 73 ;; lr use 6 [bp] 7 [sp] 16 [argp] 20 [frame] 69 73 ;; lr def 17 [flags] 74 76 ;; live in 6 [bp] 7 [sp] 16 [argp] 20 [frame] 66 69 73 ;; live gen 17 [flags] 74 76 ;; live kill 23 NOTE_INSN_BASIC_BLOCK 24 r74:DI=sign_extend(r73:SI) REG_DEAD: r73:SI 26 r76:DI=zero_extend([r74:DI+`mode_size']) REG_DEAD: r74:DI 27 flags:CCZ=cmp(r76:DI,[r69:DI]) REG_DEAD: r76:DI REG_DEAD: r69:DI 28 pc={(flags:CCZ==0)?L57:pc} REG_DEAD: flags:CCZ REG_BR_PROB: 0x19e ;; succ: 8 ;; 6 ;; lr out 6 [bp] 7 [sp] 16 [argp] 20 [frame] 66 ;; live out 6 [bp] 7 [sp] 16 [argp] 20 [frame] 66 But that insn setting reg 73 has been eliminated according to the LRA (.194r.reload) dump: Updating elimination of equiv for reg 73 ... Removing equiv init insn 9 (freq=1) 9 r73:SI=[argp:DI+0x30] REG_EQUIV: [argp:DI] deleting insn with uid = 9. Choosing alt 0 in insn 15: (0) r Choosing alt 0 in insn 18: (0) r Changing pseudo 73 in operand 0 of insn 21 on equiv [argp:DI+0x30] Choosing alt 0 in insn 21: (0) rm (1) re Changing pseudo 73 in operand 1 of insn 24 on equiv [argp:DI+0x30] Choosing alt 1 in insn 24: (0) r (1) rm Choosing alt 0 in insn 26: (0) =r (1) qm Choosing alt 1 in insn 27: (0) r (1) rm Choosing alt 1 in insn 31: (0) m (1) re You do update the DF_LR sets elsewhere in LRA, so I was wondering if it is intentional or not to leave the DF_LR sets unchanged after elimination. Ciao! Steven -- 8< -- EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i) mark_pseudo_dead (i); +{ + bitmap b = BITMAP_ALLOC (®_obstack); + bitmap_copy (b, DF_LR_IN (bb)); + bitmap_clear_range (b, 0, FIRST_PSEUDO_REGISTER); + EXECUTE_IF_SET_IN_BITMAP (DF_LR_IN (bb), FIRST_PSEUDO_REGISTER, j, bi) +if (sparseset_bit_p (pseudos_live, j)) + bitmap_clear_bit (b, j); + if (! bitmap_empty_p (b)) +{ + fprintf (stderr, "LIVEIN-mismatch in bb %d:\n", bb->index); + EXECUTE_IF_SET_IN_BITMAP (DF_LR_IN (bb), FIRST_PSEUDO_REGISTER, j, bi) +if (sparseset_bit_p (pseudos_live, j)) + fprintf (stderr, "%u, ", j); + fprintf (stderr, "\n"); + bitmap_clear_range (DF_LR_IN (bb), 0, FIRST_PSEUDO_REGISTER); + dump_bitmap (stderr, DF_LR_IN (bb)); + dump_bitmap (stderr, b); + gcc_unreachable (); +} + BITMAP_FREE (b); +} -- 8< -- -- 8< -- typedef long unsigned int size_t; union tree_node; typedef union tree_node *tree; struct rtx_def; typedef struct rtx_def *rtx; struct rtx_def { union u { long hwint[1]; } u; }; typedef struct mem_attrs { tree expr; rtx offset; rtx size; int alias; unsigned int align; } mem_attrs; extern void *ggc_alloc_stat (size_t); typedef struct htab *htab_t; extern void ** htab_find_slot (htab_t, const void *, int); static htab_t mem_attrs_htab; extern unsigned char mode_size[256]; mem_attrs *get_mem_attrs (int, tree, rtx, rtx, unsigned int, unsigned char, int); mem_attrs * get_mem_attrs (int alias, tree expr, rtx offset, rtx size, unsigned int align, unsigned char addrspace, int mode) { mem_attrs attrs; void **slot; if (alias == 0 && (size == 0 || (mode != 1 && mode_size[mode] == size->u.hwint[0]))) return 0; attrs.alias = alias; slot = htab_find_slot (mem_attrs_htab, &attrs, 1); return (mem_attrs *) *slot; } -- 8< --
Re: Functions that are CSEable but not pure
On 10/04/2012 08:45 AM, Jakub Jelinek wrote: On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote: On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill wrote: If the result is not needed, are we allowed to remove a call to this function? No. Unless you know the same function has been already called. Yes, we are. If we optimize away a use of the variable, we can also remove the call to the function. We can also hoist the call so its side-effects occur before other side-effects, as the standard only requires that the initialization of the variable occur some time between thread creation and the first use of the variable. So - what's wrong with using pure? That it doesn't "feel" correct? No, it can modify memory. Right, that's the issue. The back end assumes that calls to pure functions won't clobber memory; calls to these functions can clobber arbitrary memory, but only on the first call. I think the plan was for these functions not to return any value, No, I'm talking about the wrapper function which returns a reference to the variable (as in my example). Jason
Re: Functions that are CSEable but not pure
On Thu, Oct 04, 2012 at 08:56:03AM -0400, Jason Merrill wrote: > >I think the plan was for these functions not to return any value, > > No, I'm talking about the wrapper function which returns a reference > to the variable (as in my example). Sure, but I thought you want to inline the wrapper function as soon as possible. Or do you want to keep it as some kind of builtin that gets expanded during expansion to the test and call? Jakub
Re: Functions that are CSEable but not pure
On Thu, Oct 4, 2012 at 2:56 PM, Jason Merrill wrote: > On 10/04/2012 08:45 AM, Jakub Jelinek wrote: >> >> On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote: >>> >>> On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill wrote: >>> If the result is not needed, are we allowed to remove a call to this >>> function? >> >> >> No. Unless you know the same function has been already called. > > > Yes, we are. If we optimize away a use of the variable, we can also remove > the call to the function. We can also hoist the call so its side-effects > occur before other side-effects, as the standard only requires that the > initialization of the variable occur some time between thread creation and > the first use of the variable. > > >>> So - what's wrong with using pure? That it doesn't "feel" correct? >> >> >> No, it can modify memory. > > > Right, that's the issue. The back end assumes that calls to pure functions > won't clobber memory; calls to these functions can clobber arbitrary memory, > but only on the first call. Ugh. Especially with the above (you can DCE those calls) makes this severly mis-specified ... and any implementation error-prone (look what mess our losely defined 'malloc' attribute opened ...). I thought of a testcase like int *p = get_me (); .. = *p; int *q = get_me (); .. = *q; and get_me allocating/initalizing and returning a singleton. But you tell me it's more complicated and get_me () needs to be a barrier for any load/store (as it may modify arbitrary memory, but only on the "first" call). I think that "may modify arbitrary memory" isn't going to fly and my answer would be, better don't try to optimize anything here, at least not in generic terms. How would you handle this in the alias oracle? How would you then make CSE recognize two functions return the same value and are CSEable? >> I think the plan was for these functions not to return any value, > > > No, I'm talking about the wrapper function which returns a reference to the > variable (as in my example). Can you come up with a short but complete testcase illustrating the issue better (preferably C, so I don't need to read-in magic points where construction happens)? Thanks, Richard. > Jason >
IRA subregs playing badly with REG_EQUIV (Was: Reload reloading outdated values ?)
Thanks to the help of segher and iant on IRC (thanks again!), I narrowed my problem down to something I can fully understand and explain (well I hope). There is a bad interaction between the IRA handling of subregs and reload's use of REG_EQUIV annotations. Each one seems to be in its right to do what it does, but the end result is wrong generated code. Here's what happens, using some simplified RTL: Before IRA: (insn 1 [(set (subreg:SI (reg:DI 6835) 0) (set (subreg:SI (reg:DI 6835) 4)]) (insn 2 (set (reg:SI 229) (mem:SI (subreg:SI (reg:DI 6835) 0))) (REG_EQUIV (mem:SI (subreg:SI (reg:DI 6835) 0 (insn 3 (set (reg:SI 3346) (...))) (insn 4 (use (reg:SI 3346)) (REG_DEAD (reg:SI 3346))) (insn 5 (use (reg:SI 229))) (insn 6 (use (subreg:SI (reg:DI 6835) 4)) (REG_DEAD (reg:DI 6835))) IRA does the following allocation: pseudo reg 6835 gets hard reg 8 pseudo reg 3346 gets hard reg 8 pseudo reg 229 gets spilled IRA is in its right to allocate hard reg 8 for both reg 3346 and 6835, because it tracks conflicts at the word level, and SI reg 3346 only conflicts with the high part of DI reg 6835. Had reg 229 been allocated to a hard register, the code would be correct. In my example though, the register pressure is high and reg 229 gets no hard register. Reload then chooses to use the equivalent memory location for this register. We end up with a reload for (insn 5) that does: (insn 6061 (set (reg:SI 44) (mem:SI (reg:SI 8))) (insn 5 (use (reg:SI 44))) The expression used in (insn 6061) for the reload value is the allocated version of the REG_EQUIV expression, but at the point it gets inserted, r8 has been clobbered by (insn 4). To try to sum it up, the use of the memory equivalence by reload extends the liveness of one half the reg 6835, thus generating code that isn't wrong because the interference graph computed by IRA isn't right anymore. I'm not sure what the right fix is: 1/ Denying REG_EQUIVs which contain subregs 2/ Tightening the interference graph computed by IRA 3/ Some hack in reload to prevent this equivalence to be used in this case 1/ seems simple but I'm not sure about its consequences. 2/ would incur a degradation in register allocation quality. I don't know about 3/. What do people think?
Re: [AVR] Missing avr51-flash1.x in avr target specific tests?
Senthil Kumar Selvaraj wrote: > Some tests in gcc/testsuite/gcc.target/avr/torture (builtins-2.c, for > e.g.) have -Tavr51-flash1.x specified in dg-options. The tests currently > fail with an unable to open linker script error for that file. > > Is that linker script supposed to be checked into source control? Or am > I missing some configure/build option? I found no way to specify a linker script for one single test and supply that linker script in the testsuite source tree. It's likely that I used the wrong approach just because my limited knowledge on DejaGNU and GCC testsuite framework. The linker script must be located in a place where the compiler will search for it, e.g. some directory that contains the .x and that you added per -L to the ldflags in your DejaGNU board description. Johann
Re: Functions that are CSEable but not pure
On 10/04/2012 09:07 AM, Richard Guenther wrote: Ugh. Especially with the above (you can DCE those calls) makes this severly mis-specified ... and any implementation error-prone (look what mess our losely defined 'malloc' attribute opened ...). I thought of a testcase like int *p = get_me (); .. = *p; int *q = get_me (); .. = *q; and get_me allocating/initalizing and returning a singleton. Right. But you tell me it's more complicated and get_me () needs to be a barrier for any load/store (as it may modify arbitrary memory, but only on the "first" call). Yes, because the initialization is user-written code. I think that "may modify arbitrary memory" isn't going to fly and my answer would be, better don't try to optimize anything here, at least not in generic terms. How would you handle this in the alias oracle? How would you then make CSE recognize two functions return the same value and are CSEable? For aliasing purposes, the call is like a call to a normal function. For CSE purposes, we want to recognize identical calls and combine them. I don't know the GCC bits well enough to be any more specific. Can you come up with a short but complete testcase illustrating the issue better (preferably C, so I don't need to read-in magic points where construction happens)? int init_count; int data; void init() { static int initialized; if (!initialized) { data = ++init_count; initialized = 1; } } inline int *get_me() __attribute ((pure)); inline int *get_me() { init (); return &data; } int sink; int main() { sink = init_count; int *p = get_me(); if (init_count != 1) __builtin_abort(); int *q = get_me(); if (init_count != 1) __builtin_abort(); return *p + *q; } On this testcase, gcc -O2 doesn't reload init_count after the call to get_me because it thinks that the call can't have modified init_count. I want the compiler to know that it is safe to discard the redundant assignment, but not make assumptions about memory. On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at 08:56:03AM -0400, Jason Merrill wrote: > Sure, but I thought you want to inline the wrapper function as soon as > possible. Or do you want to keep it as some kind of builtin that gets > expanded during expansion to the test and call? Ah, I see your point; if get_me is inlined we end up with two calls to init, so it would be good to mark init with the same hypothetical attribute. Jason
Re: IRA subregs playing badly with REG_EQUIV (Was: Reload reloading outdated values ?)
On Thu, Oct 4, 2012 at 6:40 AM, Frederic Riss wrote: > Thanks to the help of segher and iant on IRC (thanks again!), I > narrowed my problem down to something I can fully understand and > explain (well I hope). > > There is a bad interaction between the IRA handling of subregs and > reload's use of REG_EQUIV annotations. Each one seems to be in its > right to do what it does, but the end result is wrong generated code. > Here's what happens, using some simplified RTL: > > Before IRA: > > (insn 1 [(set (subreg:SI (reg:DI 6835) 0) > (set (subreg:SI (reg:DI 6835) 4)]) > (insn 2 (set (reg:SI 229) (mem:SI (subreg:SI (reg:DI 6835) 0))) > (REG_EQUIV (mem:SI (subreg:SI (reg:DI 6835) 0 > (insn 3 (set (reg:SI 3346) (...))) > (insn 4 (use (reg:SI 3346)) > (REG_DEAD (reg:SI 3346))) > (insn 5 (use (reg:SI 229))) > (insn 6 (use (subreg:SI (reg:DI 6835) 4)) > (REG_DEAD (reg:DI 6835))) > > IRA does the following allocation: > pseudo reg 6835 gets hard reg 8 > pseudo reg 3346 gets hard reg 8 > pseudo reg 229 gets spilled > > IRA is in its right to allocate hard reg 8 for both reg 3346 and 6835, > because it tracks conflicts at the word level, and SI reg 3346 only > conflicts with the high part of DI reg 6835. Had reg 229 been > allocated to a hard register, the code would be correct. > > In my example though, the register pressure is high and reg 229 gets > no hard register. Reload then chooses to use the equivalent memory > location for this register. We end up with a reload for (insn 5) that > does: > > (insn 6061 (set (reg:SI 44) (mem:SI (reg:SI 8))) > (insn 5 (use (reg:SI 44))) > > The expression used in (insn 6061) for the reload value is the > allocated version of the REG_EQUIV expression, but at the point it > gets inserted, r8 has been clobbered by (insn 4). > > To try to sum it up, the use of the memory equivalence by reload > extends the liveness of one half the reg 6835, thus generating code > that isn't wrong because the interference graph computed by IRA isn't > right anymore. > > I'm not sure what the right fix is: > 1/ Denying REG_EQUIVs which contain subregs > 2/ Tightening the interference graph computed by IRA > 3/ Some hack in reload to prevent this equivalence to be used in this case > > 1/ seems simple but I'm not sure about its consequences. 2/ would > incur a degradation in register allocation quality. I don't know about > 3/. What do people think? I have not really looked at how IRA works. But I think that it must track subreg life and death somehow. So another approach would be for validate_equiv_mem in ira.c to observe when a memory address uses a subreg that has died. It already checks for full registers that have died. That code could be extended to handle subregs. Ian
Re: Functions that are CSEable but not pure
On Thu, Oct 4, 2012 at 5:22 PM, Jason Merrill wrote: > On 10/04/2012 09:07 AM, Richard Guenther wrote: >> >> Ugh. Especially with the above (you can DCE those calls) makes this >> severly mis-specified ... and any implementation error-prone (look what >> mess our losely defined 'malloc' attribute opened ...). >> >> I thought of a testcase like >> >> int *p = get_me (); >> .. = *p; >> int *q = get_me (); >> .. = *q; >> >> and get_me allocating/initalizing and returning a singleton. > > > Right. > > >> But you tell me it's more complicated and get_me () needs to >> be a barrier for any load/store (as it may modify arbitrary memory, >> but only on the "first" call). > > > Yes, because the initialization is user-written code. > > >> I think that "may modify arbitrary memory" isn't going to fly and >> my answer would be, better don't try to optimize anything here, >> at least not in generic terms. How would you handle this in >> the alias oracle? How would you then make CSE recognize >> two functions return the same value and are CSEable? > > > For aliasing purposes, the call is like a call to a normal function. For CSE > purposes, we want to recognize identical calls and combine them. I don't > know the GCC bits well enough to be any more specific. > > >> Can you come up with a short but complete testcase illustrating the issue >> better (preferably C, so I don't need to read-in magic points where >> construction >> happens)? > > > int init_count; > int data; > > void init() > { > static int initialized; > if (!initialized) > { > data = ++init_count; > initialized = 1; > } > } > > inline int *get_me() __attribute ((pure)); > inline int *get_me() > { > init (); > return &data; > } > > int sink; > > int main() > { > sink = init_count; > int *p = get_me(); > if (init_count != 1) > __builtin_abort(); > int *q = get_me(); > if (init_count != 1) > __builtin_abort(); > return *p + *q; > } > > On this testcase, gcc -O2 doesn't reload init_count after the call to get_me > because it thinks that the call can't have modified init_count. I want the > compiler to know that it is safe to discard the redundant assignment, but > not make assumptions about memory. But isn't it a fact that it _cannot_ modify init_count? If the second call is CSEable then it cannot have side-effects that are observable at the call site. Is the following an example you would consider to fall under your CSEing? int init_count; int data; int initialized; void init() { if (!initialized) { data = ++init_count; initialized = 1; } } inline int *get_me() __attribute ((pure)); inline int *get_me() { init (); return &data; } int sink; int main() { sink = init_count; int *p = get_me(); if (init_count != 1) __builtin_abort(); initialized = 0; int *q = get_me(); if (init_count != 2) __builtin_abort(); return *p + *q; } ? If so, then why can we assume get_me returns the same pointer even here? Richard. > > On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at > 08:56:03AM -0400, Jason Merrill wrote: >> Sure, but I thought you want to inline the wrapper function as soon as >> possible. Or do you want to keep it as some kind of builtin that gets >> expanded during expansion to the test and call? > > Ah, I see your point; if get_me is inlined we end up with two calls to init, > so it would be good to mark init with the same hypothetical attribute. > > Jason >
Re: Functions that are CSEable but not pure
On Thu, Oct 4, 2012 at 7:08 PM, Richard Guenther wrote: > On Thu, Oct 4, 2012 at 5:22 PM, Jason Merrill wrote: >> On 10/04/2012 09:07 AM, Richard Guenther wrote: >>> >>> Ugh. Especially with the above (you can DCE those calls) makes this >>> severly mis-specified ... and any implementation error-prone (look what >>> mess our losely defined 'malloc' attribute opened ...). >>> >>> I thought of a testcase like >>> >>> int *p = get_me (); >>> .. = *p; >>> int *q = get_me (); >>> .. = *q; >>> >>> and get_me allocating/initalizing and returning a singleton. >> >> >> Right. >> >> >>> But you tell me it's more complicated and get_me () needs to >>> be a barrier for any load/store (as it may modify arbitrary memory, >>> but only on the "first" call). >> >> >> Yes, because the initialization is user-written code. >> >> >>> I think that "may modify arbitrary memory" isn't going to fly and >>> my answer would be, better don't try to optimize anything here, >>> at least not in generic terms. How would you handle this in >>> the alias oracle? How would you then make CSE recognize >>> two functions return the same value and are CSEable? >> >> >> For aliasing purposes, the call is like a call to a normal function. For CSE >> purposes, we want to recognize identical calls and combine them. I don't >> know the GCC bits well enough to be any more specific. >> >> >>> Can you come up with a short but complete testcase illustrating the issue >>> better (preferably C, so I don't need to read-in magic points where >>> construction >>> happens)? >> >> >> int init_count; >> int data; >> >> void init() >> { >> static int initialized; >> if (!initialized) >> { >> data = ++init_count; >> initialized = 1; >> } >> } >> >> inline int *get_me() __attribute ((pure)); >> inline int *get_me() >> { >> init (); >> return &data; >> } >> >> int sink; >> >> int main() >> { >> sink = init_count; >> int *p = get_me(); >> if (init_count != 1) >> __builtin_abort(); >> int *q = get_me(); >> if (init_count != 1) >> __builtin_abort(); >> return *p + *q; >> } >> >> On this testcase, gcc -O2 doesn't reload init_count after the call to get_me >> because it thinks that the call can't have modified init_count. I want the >> compiler to know that it is safe to discard the redundant assignment, but >> not make assumptions about memory. > > But isn't it a fact that it _cannot_ modify init_count? If the second call > is CSEable then it cannot have side-effects that are observable at > the call site. Is the following an example you would consider to fall > under your CSEing? > > int init_count; > int data; > int initialized; > > void init() > { > if (!initialized) > { > data = ++init_count; > initialized = 1; > } > } > > inline int *get_me() __attribute ((pure)); > inline int *get_me() > { > init (); > return &data; > } > > int sink; > > int main() > { > sink = init_count; > int *p = get_me(); > if (init_count != 1) > __builtin_abort(); > initialized = 0; > int *q = get_me(); > if (init_count != 2) > __builtin_abort(); > return *p + *q; > } > > ? If so, then why can we assume get_me returns the same pointer even here? Or similar: int main() { sink = init_count; int *p = get_me(); if (init_count != 1) __builtin_abort(); } is this required to not abort? p is unused and with 'pure' you'd DCE the call. That is, I am confused about the distinction you seem to make between the static variable 'initialized' and the global 'init_count'. You seem to imply that the attribute would mean that we can CSE side-effects to global memory but that those side-effects may be value-changing! In practice for any CSE implementation this hypotetical "cse-side-effects" attribute would only allow us to CSE if there are no intermediate side-effects between two such calls, and as soon as the second get_me call would be CSEd we'd also CSE the init_count value (which you didn't want to CSE?!) Still confused ;) Richard. > Richard. > >> >> On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at >> 08:56:03AM -0400, Jason Merrill wrote: >>> Sure, but I thought you want to inline the wrapper function as soon as >>> possible. Or do you want to keep it as some kind of builtin that gets >>> expanded during expansion to the test and call? >> >> Ah, I see your point; if get_me is inlined we end up with two calls to init, >> so it would be good to mark init with the same hypothetical attribute. >> >> Jason >>
Re: Functions that are CSEable but not pure
On Thu, Oct 04, 2012 at 07:08:02PM +0200, Richard Guenther wrote: > But isn't it a fact that it _cannot_ modify init_count? If the second call > is CSEable then it cannot have side-effects that are observable at > the call site. Is the following an example you would consider to fall > under your CSEing? > > int init_count; > int data; > int initialized; > > void init() > { > if (!initialized) > { > data = ++init_count; > initialized = 1; > } > } > > inline int *get_me() __attribute ((pure)); > inline int *get_me() > { > init (); > return &data; > } > > int sink; > > int main() > { > sink = init_count; > int *p = get_me(); > if (init_count != 1) > __builtin_abort(); > initialized = 0; > int *q = get_me(); > if (init_count != 2) > __builtin_abort(); > return *p + *q; > } The above isn't a singleton function, as you are clearing initialized, therefore it doesn't have the properties the thread_local var get_address wrapper has. The singleton function really is void singleton (void) { static __thread bool initialized; if (!initialized) { initialized = true; call_some_function_that_may_modify_memory (); } } and has side effects just first time in a thread. Jakub
Re: Functions that are CSEable but not pure
On 10/04/2012 01:15 PM, Richard Guenther wrote: That is, I am confused about the distinction you seem to make between the static variable 'initialized' and the global 'init_count'. The distinction is that "initialized" is an implementation detail that once set is never cleared; it is the mechanism by which only the first call has side effects. init_count is a normal user variable. Jason
Re: Functions that are CSEable but not pure
On Thu, Oct 4, 2012 at 7:15 PM, Jakub Jelinek wrote: > On Thu, Oct 04, 2012 at 07:08:02PM +0200, Richard Guenther wrote: >> But isn't it a fact that it _cannot_ modify init_count? If the second call >> is CSEable then it cannot have side-effects that are observable at >> the call site. Is the following an example you would consider to fall >> under your CSEing? >> >> int init_count; >> int data; >> int initialized; >> >> void init() >> { >> if (!initialized) >> { >> data = ++init_count; >> initialized = 1; >> } >> } >> >> inline int *get_me() __attribute ((pure)); >> inline int *get_me() >> { >> init (); >> return &data; >> } >> >> int sink; >> >> int main() >> { >> sink = init_count; >> int *p = get_me(); >> if (init_count != 1) >> __builtin_abort(); >> initialized = 0; >> int *q = get_me(); >> if (init_count != 2) >> __builtin_abort(); >> return *p + *q; >> } > > The above isn't a singleton function, as you are clearing initialized, > therefore it doesn't have the properties the thread_local var get_address > wrapper has. The singleton function really is > void singleton (void) > { > static __thread bool initialized; > if (!initialized) { > initialized = true; > call_some_function_that_may_modify_memory (); > } > } > and has side effects just first time in a thread. So I suppose the testcase that would be "valid" but break with using pure would be instead int init_count; int data; void init() { static int initialized; if (!initialized) { data = ++init_count; initialized = 1; } } inline int *get_me() __attribute ((pure)); inline int *get_me() { init (); return &data; } int main() { int x = init_count; int *p = get_me(); if (init_count == x) __builtin_abort(); int *q = get_me(); if (init_count == x) __builtin_abort(); } here when get_me is pure we CSE init_count over the _first_ call of get_me. Technically we only might CSE it over the second call of get_me. Thus there is flow-sensitive information coming out of nowhere when we try to skip the second get_me () call in looking up the init_count load after it (coming out of nowhere for the alias-oracle query which only includes 'init_count' and the stmt 'int *q = get_me ()'). Thus my answer is, for the alias-oracle the hypothetical 'singleton_init' attribute provides no useful information. Thus CSE and/or DCE of 'singleton_init' calls has to use special code (which I can think of being added to DCE, harder to existing SCCVN, maybe easier to DOM) which will usually not fit into the algorithm (so you might be able to teach FRE elimination of how to elminiate the 2nd call to get_me () but you will need a 2nd FRE pass to then figure out that init_count can be CSEd as well - which makes this all very undesirable). Inlining will also make this harder (you lose the attribute and fall into the lucas situation ...). At some point this "singleton"-ness should be auto-detected (there is inlined code in SPEC 2k lucas that would benefit from constant propagation, and SPEC 2k6 libquantum that uses a related concept - change / restore of a global in a function to the net effect that the function should not be considered clobbering it). Richard. > Jakub
Re: inlined memcpy/memset degradation in gcc 4.6 or later
On Tue, Oct 2, 2012 at 4:19 PM, Walter Lee wrote: > > On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in > > gcc 4.6 and later versus gcc 4.4. Though I find the problem on > > TILE-Gx, I think this is a problem for any architectures with > > SLOW_UNALIGNED_ACCESS set to 1. > > > > Consider the following program: > > > > struct foo { > > int x; > > }; > > > > void copy(struct foo* f0, struct foo* f1) > > { > > memcpy (f0, f1, sizeof(struct foo)); > > } > > > > In gcc 4.4, I get the desired inline memcpy: ... > > In gcc 4.7, however, I get inlined byte-by-byte copies: ... On Thu, Oct 04, 2012 at 01:58:54PM +0200, Richard Guenther wrote: > There is no way to fix it. memcpy does not require aligned arguments > and the merely presence of a typed pointer contains zero information > of alignment for the middle-end. If you want to excercise C's notion > of alignemnt requirements then do not use memcpy but > > *f0 = *f1; > > which works equally well. Perhaps I'm missing something. While memcpy is not permitted to assume alignment of its arguments, copy is. Otherwise, if I wrote void copy(struct foo* f0, struct foo* f1) { *f0 = *f1; } the compiler would also be forbidden from assuming any alignment. So, when compiling "copy", do we lack the information that the memcpy call is to the standard ISO memcpy function? If we know that it is the standard function we should be able to do the copy assuming everything is properly aligned. > Btw, the new beavior even fixed bugs. Could you point to a PR that was fixed by the change? There must be some way to distinguish this case from those cases.
Re: Functions that are CSEable but not pure
On 10/04/2012 01:42 PM, Richard Guenther wrote: So I suppose the testcase that would be "valid" but break with using pure would be instead int main() { int x = init_count; int *p = get_me(); if (init_count == x) __builtin_abort(); int *q = get_me(); if (init_count == x) __builtin_abort(); } here when get_me is pure we CSE init_count over the _first_ call of get_me. That's OK for C++ thread_local semantics; the initialization is specified to happen some time before the first use, so the testcase is making an invalid assumption. This might not be desirable for user-written singleton functions, though. Jason
Re: ARM/getting rid of superfluous zero extension
> Any suggestion about how I could avoid generating this zero_extension? Redundant extensions have been a hot topic for some time. The combiner should catch the local easy cases, we have ree.c for the nonlocal easy cases and Tom recently posted: http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00408.html which should catch more complex cases. I guess someone should gather the various missed optimization cases, draw a global picture of the situation and see how the various approaches could be fitted together. -- Eric Botcazou
Re: reverse bitfield patch
> ChangeLog missing, new functions need a toplevel comment documenting > function, argument and return value as per coding conventions. Any review of the patch itself? I know the overhead is not there...
libbacktrace and darwin
Is libbacktrace currently functional in gcc trunk and is it expected to function on darwin? While I could understand it not working on installed binaries of FSF gcc that were stripped, I would think it should work for make check in the build tree since all of the debug code should be present in the object files. Or doesn't libbacktrace know to look there for the dwarf code? Thanks in advance for any clarifications. Jack
Re: libbacktrace and darwin
On Thu, Oct 4, 2012 at 1:32 PM, Jack Howarth wrote: >Is libbacktrace currently functional in gcc trunk and is it expected > to function on darwin? While I could understand it not working on installed > binaries of FSF gcc that were stripped, I would think it should work for > make check in the build tree since all of the debug code should be present > in the object files. Or doesn't libbacktrace know to look there for the > dwarf code? Thanks in advance for any clarifications. libbacktrace is functional in GCC trunk. However, it does not yet support the Mach-O object file format. I hope to work on that at some point, but it would be great if somebody else tackled it. It's probably straightforward to implement based on code in libiberty/simple-object-mach-o.c. The libbacktrace code can be simpler than the libiberty code--compare libiberty/simple-object-elf.c and libbacktrace/elf.c. Ian
Re: ARM/getting rid of superfluous zero extension
On Thu, Oct 4, 2012 at 2:18 PM, Eric Botcazou wrote: >> Any suggestion about how I could avoid generating this zero_extension? > > Redundant extensions have been a hot topic for some time. The combiner should > catch the local easy cases, we have ree.c for the nonlocal easy cases and Tom > recently posted: > http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00408.html > which should catch more complex cases. > > I guess someone should gather the various missed optimization cases, draw a > global picture of the situation and see how the various approaches could be > fitted together. Kenny already did that as part of the thread with Tom. Tom's patch also is incomplete. - David
Re: ARM/getting rid of superfluous zero extension
David and i have been talking about this for some time. what is needed is a real global optimization algorithm. my leaning is to make do it at the rtl level because that is where everything has been exposed. but it would be a lot easier in ssa form. The first step in my opinion is to ask the backend what kind of operations are expensive and what kinds of operations are cheap. for instance some risc machines can do comparisons at any mode and some only do them on the width of a register. also some machines have different costs associated with sign or zero extension. I really think that you not only need to connect up the sources and syncs but you have to understand the space of what kind of transformations changes are going to be cheapest. People do not seem to ask this kind of question at the tree level which is one of the reasons that i lean to doing it at the rtl level. but the chain that goes from a load or some other operation that has a specific width output to the sync that has a specific width input can be a long one and only something equivalent to ssa or dataflow is going to let you connect the true sources with the true syncs. this is the problem with tom's patch.One of the examples that i have seen is a loop that has di mode add of an index variable and (had) an si mode comparison to exit the loop.However, later passes removed the si mode comparison because on the ppc it used the loop count instruction.but the add of the index variable with sign extension remained because it was an index variable and so the value fed back into itself. you are never going clean this up unless you do something global so that you can break the recurrence and see that there is no sync to that web that really needs the value to be si mode. kenny On 10/04/2012 06:16 PM, David Edelsohn wrote: On Thu, Oct 4, 2012 at 2:18 PM, Eric Botcazou wrote: Any suggestion about how I could avoid generating this zero_extension? Redundant extensions have been a hot topic for some time. The combiner should catch the local easy cases, we have ree.c for the nonlocal easy cases and Tom recently posted: http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00408.html which should catch more complex cases. I guess someone should gather the various missed optimization cases, draw a global picture of the situation and see how the various approaches could be fitted together. Kenny already did that as part of the thread with Tom. Tom's patch also is incomplete. - David
Error reporting functions
Hi Everyone, This question is mainly for some future submission. Am I allowed to use "fatal_error (..)"? Mainly, I want to use it in cases where I want to say "if this error has occurred, I see no reason to go forward with compilation." Thanks, Balaji V. Iyer.
Re: Error reporting functions
On Thu, Oct 4, 2012 at 5:21 PM, Iyer, Balaji V wrote: > This question is mainly for some future submission. Am I allowed to > use "fatal_error (..)"? Mainly, I want to use it in cases where I want to say > "if this error has occurred, I see no reason to go forward with compilation." You are allowed to use it in exceptional circumstances. Not normally "I see no reason to go forward" but "I can not go forward because some critical operation has failed." There is also internal_error for, well, internal errors; use that for data corruption. And gcc_assert and gcc_unreachable for things that can not happen. Ian
Re: [LRA] liveness update not done after elimination??
On 12-10-04 8:53 AM, Steven Bosscher wrote: Hello Vlad, Is it intentional that DF_LR_IN and DF_LR_OUT are not updated after "Updating elimination of equiv for reg..."? I have some checking in place in process_bb_lives at the end of the function, and it triggers on the test case. (Checking code and test case is at the end of this e-mail.) It show: LIVEIN-mismatch in bb 5: 66, 69, 66, 69, 73 73 t.c: In function 'get_mem_attrs': t.c:47:1: internal compiler error: in process_bb_lives, at lra-lives.c:866 } ^ And that's correct if you look at the LR_IN set of basic block 5 in the IRA (.193r.ira) dump: Ops. I'll look at this. As I remember, I thought about the problem. But I did not wanted to update live info because it is expensive and constraint equiv substitution by safe cases. Somehow I added unsafe cases (containing pseudos) and even added the code to spill pseudos in substitutions when they start to conflict with others because of widening live ranges but forgot about live info. I'll try to fix it as soon as possible.