Re: [RFC, LRA] Repeated looping over subreg reloads.
Vladimir Makarov wrote: On 12/4/2013, 6:15 AM, Tejas Belagod wrote: Hi, I'm trying to relax CANNOT_CHANGE_MODE_CLASS for aarch64 to allow all mode changes on FP_REGS as aarch64 does not have register-packing, but I'm running into an LRA ICE. A test case generates an RTL subreg of the following form (set (reg:DF 97) (subreg:DF (reg:V2DF 95) 8)) LRA has to reload the subreg because the subreg is not representable as a full register. When LRA reloads this in lra-constraints.c:simplyfy_operand_subreg (), it seems to reload SUBREG_REG() and leave the byte offset alone. i.e. (set (reg:V2DF 100) (reg:V2DF 95)) (set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8)) The code in lra-constraints.c is this conditional: /* Force a reload of the SUBREG_REG if this is a constant or PLUS or if there may be a problem accessing OPERAND in the outer mode. */ if ((REG_P (reg) insert_move_for_subreg (insert_before ? &before : NULL, insert_after ? &after : NULL, reg, new_reg); } What happens subsequently is that LRA keeps looping over this RTL and keeps reloading the SUBREG_REG() till the limit of constraint passes is reached. (set (reg:V2DF 100) (reg:V2DF 95)) (set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8)) I can't see any place where this subreg is resolved (eg. into equiv memref) before the next iteration comes around for reloading the inputs and outputs of curr_insn. Or am I missing something some part of code that tries reloading the subreg with different alternatives or reg classes? I guess this behaviour is wrong. We could spill the V2DF pseudo or put it into another class reg. But it is not implemented. This code is actually a modified version of reload pass one. We could implement alternative strategies and a check for potential loop (such code exists in process_alt_operands). Could you send me the macro change and the test. I'll look at it and figure out what can we do. Hi, Thanks for looking at this. The macro change is in this patch http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03638.html. The test is gcc.c-torture/compile/simd-3.c and when compiled with -O1 for aarch64, ICEs: gcc/testsuite/gcc.c-torture/compile/simd-3.c:22:1: internal compiler error: Maximum number of LRA constraint passes is achieved (30) Also, I'm curious to know - is it possible to vec_extract for vector mode subregs and zero/sign extract for scalars and spilling be the last resort if either of these are not possible? As you say, non-zero SUBREG_BYTE offset could also be resolved using a different regclass where the sub-mode could just be a full-register. Thanks, Tejas.
Remove spam in GCC mailing list
Here's some spam posts in mailing lists: http://gcc.gnu.org/ml/gcc-bugs/2013-07/msg01127.html http://gcc.gnu.org/ml/gcc/2013-04/msg00190.html http://gcc.gnu.org/ml/gcc/2013-04/msg00276.html http://gcc.gnu.org/ml/gcc/2013-04/msg00143.html The mailing list administrators needs to clean up spam.
Re: Dependency confusion in sched-deps
Hi, On Thu, 5 Dec 2013, Maxim Kuvyrkov wrote: > Output dependency is the right type (write after write). Anti > dependency is write after read, and true dependency is read after write. > > Dependency type plays a role for estimating costs and latencies between > instructions (which affects performance), but using wrong or imprecise > dependency type does not affect correctness. In the context of GCC and the middle ends memory model this statement is not correct. For some dependency types we're using type based aliasing to disambiguate, i.e ignore that dependency, which for others we don't. In particular a read-after-write memory-access dependency can be ignored if type info says they can't alias (because a program where both _would_ access the same memory would be invalid according to our mem model), but for write-after-read or write-after-write we cannot do that disambiguation (because the last write overrides the dynamic type of the memory cell even if it was incompatible with the one before). Ciao, Michael.
Libbacktrace backtrace_vector_finish
Hi! I'm trying to understand how the backtrace_vector_* APIs are meant to work and used, but at least for alloc.c don't see how it can work properly: Both backtrace_vector_grow and backtrace_vector_release use base = realloc (vec->base, alc); or vec->base = realloc (vec->base, vec->size); (note, in the latter case it is even a memory leak if realloc fails), but that assumes that that vec->base has been returned by malloc/realloc etc. But, void backtrace_vector_finish (struct backtrace_state *state ATTRIBUTE_UNUSED, struct backtrace_vector *vec) { vec->base = (char *) vec->base + vec->size; vec->size = 0; } will change vec->base so that it no longer is an address returned by malloc/realloc, so next time you call backtrace_vector_grow, if it will actually need to reallocate anything, it will crash in realloc or silently misbehave. If this works properly in mmap.c implementation, perhaps backtrace_vector_finish in alloc.c should just backtrace_vector_release and memset (*vec, 0, sizeof (*vec)); ? Jakub
Re: C++ std headers and malloc, realloc poisoning
On 12/04/2013 04:03 PM, Jakub Jelinek wrote: I think the most important reason is that we want to handle out of mem cases consistently, so instead of malloc etc. we want users to use xmalloc etc. that guarantee non-NULL returned value, or fatal error and never returning. For operator new that is solvable through std::set_new_handler I guess, but for malloc we really don't want people to deal with checking NULL return values from those everywhere. A simple workaround would be to disable poisoning of malloc/realloc on OS X (or when the build machine uses libc++, if that's easy to detect). Jason
Re: C++ std headers and malloc, realloc poisoning
On Thu, 2013-12-05 at 10:45 -0500, Jason Merrill wrote: > On 12/04/2013 04:03 PM, Jakub Jelinek wrote: > > I think the most important reason is that we want to handle out of mem > > cases consistently, so instead of malloc etc. we want users to use xmalloc > > etc. that guarantee non-NULL returned value, or fatal error and never > > returning. For operator new that is solvable through std::set_new_handler > > I guess, but for malloc we really don't want people to deal with checking > > NULL return values from those everywhere. > > A simple workaround would be to disable poisoning of malloc/realloc on > OS X (or when the build machine uses libc++, if that's easy to detect). Whether libc++ uses malloc/realloc/free in some implementation in a header file or not is an implementation detail. It could use it today and stop doing so tomorrow ;) Maybe a configure option to disable the poisoning would be better in this case? Cheers, Oleg
Re: Truncate optimisation question
> The comment says that we're trying to match: > > 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int > 2. (set (reg:QI) (plus:QI (mem:QI) (const_int))) > 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int))) > 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int))) > 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int > 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) > 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) > 8. (set (reg:SI) (leu:SI (plus:QI ...))) > > And I think that's what we should be matching in cases where the > extension isn't redundant, even on RISC targets. Which one(s) exactly? Most of the RISC targets we have are parameterized (WORD_REGISTER_OPERATIONS, PROMOTE_MODE, etc) to avoid operations in modes smaller than the word mode. > The problem here isn't really about which mode is on the plus, > but whether we recognise that the extension instruction is redundant. > I.e. we start with: > > (insn 9 8 10 2 (set (reg:SI 120) > (plus:SI (subreg:SI (reg:QI 118) 0) > (const_int -48 [0xffd0]))) test.c:6 -1 > (nil)) > (insn 10 9 11 2 (set (reg:SI 121) > (and:SI (reg:SI 120) > (const_int 255 [0xff]))) test.c:6 -1 > (nil)) > (insn 11 10 12 2 (set (reg:CC 100 cc) > (compare:CC (reg:SI 121) > (const_int 9 [0x9]))) test.c:6 -1 > (nil)) > > and what we want combine to do is to recognise that insn 10 is redundant > and reduce the sequence to: > > (insn 9 8 10 2 (set (reg:SI 120) > (plus:SI (subreg:SI (reg:QI 118) 0) > (const_int -48 [0xffd0]))) test.c:6 -1 > (nil)) > (insn 11 10 12 2 (set (reg:CC 100 cc) > (compare:CC (reg:SI 120) > (const_int 9 [0x9]))) test.c:6 -1 > (nil)) > > But insn 11 is redundant on all targets, not just RISC ones. > It isn't about whether the target has a QImode addition or not. That's theoritical though since, on x86 for example, the redundant instruction isn't even generated because of the QImode addition... > Well, I think making the simplify-rtx code conditional on the target > would be the wrong way to go. If we really can't live with it being > unconditional then I think we should revert it. But like I say I think > it would be better to make combine recognise the redundancy even with > the new form. (Or as I say, longer term, not to rely on combine to > eliminate redundant extensions.) But I don't have time to do that myself... It helps x86 so we won't revert it. My fear is that we'll need to add code in other places to RISCify back the result of this "simplification". -- Eric Botcazou
Re: C++ std headers and malloc, realloc poisoning
On 12/05/2013 10:59 AM, Oleg Endo wrote: On Thu, 2013-12-05 at 10:45 -0500, Jason Merrill wrote: A simple workaround would be to disable poisoning of malloc/realloc on OS X (or when the build machine uses libc++, if that's easy to detect). Whether libc++ uses malloc/realloc/free in some implementation in a header file or not is an implementation detail. It could use it today and stop doing so tomorrow ;) Yep, which is why I described my suggestion as a workaround. :) But having the poisoning disabled when building with clang doesn't seem like a significant problem even if it becomes unnecessary, since any misuse will still show up when building stage 2 and on other platforms. Maybe a configure option to disable the poisoning would be better in this case? That seems unlikely to help users. Jason
Re: C++ std headers and malloc, realloc poisoning
On Thu, Dec 05, 2013 at 12:05:23PM -0500, Jason Merrill wrote: > On 12/05/2013 10:59 AM, Oleg Endo wrote: > >On Thu, 2013-12-05 at 10:45 -0500, Jason Merrill wrote: > >>A simple workaround would be to disable poisoning of malloc/realloc on > >>OS X (or when the build machine uses libc++, if that's easy to detect). > > > >Whether libc++ uses malloc/realloc/free in some implementation in a > >header file or not is an implementation detail. It could use it today > >and stop doing so tomorrow ;) > > Yep, which is why I described my suggestion as a workaround. :) > > But having the poisoning disabled when building with clang doesn't > seem like a significant problem even if it becomes unnecessary, > since any misuse will still show up when building stage 2 and on > other platforms. Guess the problem is that clang pretends to be (old) version of GCC. Otherwise all the poisioning, which is guarded by: #if (GCC_VERSION >= 3000) wouldn't be applied. So perhaps we want a hack there && !defined __clang__ or similar. Jakub
Re: C++ std headers and malloc, realloc poisoning
On Thu, 2013-12-05 at 18:11 +0100, Jakub Jelinek wrote: > On Thu, Dec 05, 2013 at 12:05:23PM -0500, Jason Merrill wrote: > > On 12/05/2013 10:59 AM, Oleg Endo wrote: > > >On Thu, 2013-12-05 at 10:45 -0500, Jason Merrill wrote: > > >>A simple workaround would be to disable poisoning of malloc/realloc on > > >>OS X (or when the build machine uses libc++, if that's easy to detect). > > > > > >Whether libc++ uses malloc/realloc/free in some implementation in a > > >header file or not is an implementation detail. It could use it today > > >and stop doing so tomorrow ;) > > > > Yep, which is why I described my suggestion as a workaround. :) > > > > But having the poisoning disabled when building with clang doesn't > > seem like a significant problem even if it becomes unnecessary, > > since any misuse will still show up when building stage 2 and on > > other platforms. > > Guess the problem is that clang pretends to be (old) version of GCC. > Otherwise all the poisioning, which is guarded by: > #if (GCC_VERSION >= 3000) > wouldn't be applied. So perhaps we want a hack there && !defined __clang__ > or similar. The problem is not clang but the exposed internals of libc++ (at least the version Apple currently ships). The problem would be the same if GCC was used as the compiler but with libc++ instead of libstdc++ (it seems some people have been trying to do that, see http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-August/010149.html) BTW, the #include in sh.c also triggered the "do_not_use_isalpha_with_safe_ctype" stuff in include/safe-ctype.h, which is a similar problem (isalpha being used in some implementation in libc++). Cheers, Oleg
Re: Truncate optimisation question
Eric Botcazou writes: >> The comment says that we're trying to match: >> >> 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int >> 2. (set (reg:QI) (plus:QI (mem:QI) (const_int))) >> 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int))) >> 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int))) >> 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int >> 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) >> 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int))) >> 8. (set (reg:SI) (leu:SI (plus:QI ...))) >> >> And I think that's what we should be matching in cases where the >> extension isn't redundant, even on RISC targets. > > Which one(s) exactly? Most of the RISC targets we have are parameterized > (WORD_REGISTER_OPERATIONS, PROMOTE_MODE, etc) to avoid operations in modes > smaller than the word mode. The first one, sorry. >> The problem here isn't really about which mode is on the plus, >> but whether we recognise that the extension instruction is redundant. >> I.e. we start with: >> >> (insn 9 8 10 2 (set (reg:SI 120) >> (plus:SI (subreg:SI (reg:QI 118) 0) >> (const_int -48 [0xffd0]))) test.c:6 -1 >> (nil)) >> (insn 10 9 11 2 (set (reg:SI 121) >> (and:SI (reg:SI 120) >> (const_int 255 [0xff]))) test.c:6 -1 >> (nil)) >> (insn 11 10 12 2 (set (reg:CC 100 cc) >> (compare:CC (reg:SI 121) >> (const_int 9 [0x9]))) test.c:6 -1 >> (nil)) >> >> and what we want combine to do is to recognise that insn 10 is redundant >> and reduce the sequence to: >> >> (insn 9 8 10 2 (set (reg:SI 120) >> (plus:SI (subreg:SI (reg:QI 118) 0) >> (const_int -48 [0xffd0]))) test.c:6 -1 >> (nil)) >> (insn 11 10 12 2 (set (reg:CC 100 cc) >> (compare:CC (reg:SI 120) >> (const_int 9 [0x9]))) test.c:6 -1 >> (nil)) >> >> But insn 11 is redundant on all targets, not just RISC ones. >> It isn't about whether the target has a QImode addition or not. > > That's theoritical though since, on x86 for example, the redundant > instruction > isn't even generated because of the QImode addition... Not for this testcase, sure, but we use an SImode addition and keep the equivalent redundant extension until combine for: int foo (unsigned char *x) { return (((unsigned int) *x - 48) & 0xff) < 10; } immediately before combine: (insn 7 6 8 2 (parallel [ (set (reg:SI 93 [ D.1753 ]) (plus:SI (reg:SI 92 [ D.1753 ]) (const_int -48 [0xffd0]))) (clobber (reg:CC 17 flags)) ]) /tmp/foo.c:3 261 {*addsi_1} (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (insn 8 7 9 2 (set (reg:SI 94 [ D.1753 ]) (zero_extend:SI (subreg:QI (reg:SI 93 [ D.1753 ]) 0))) /tmp/foo.c:3 133 {*zero_extendqisi2} (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ]) (nil))) (insn 9 8 10 2 (set (reg:CC 17 flags) (compare:CC (reg:SI 94 [ D.1753 ]) (const_int 9 [0x9]))) /tmp/foo.c:3 7 {*cmpsi_1} (expr_list:REG_DEAD (reg:SI 94 [ D.1753 ]) (nil))) What saves us isn't QImode addition but QImode comparison: combine: (insn 7 6 8 2 (parallel [ (set (reg:SI 93 [ D.1753 ]) (plus:SI (reg:SI 92 [ D.1753 ]) (const_int -48 [0xffd0]))) (clobber (reg:CC 17 flags)) ]) /tmp/foo.c:3 261 {*addsi_1} (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (note 8 7 9 2 NOTE_INSN_DELETED) (insn 9 8 10 2 (set (reg:CC 17 flags) (compare:CC (subreg:QI (reg:SI 93 [ D.1753 ]) 0) (const_int 9 [0x9]))) /tmp/foo.c:3 5 {*cmpqi_1} (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ]) (nil))) movzbl (%rdi), %eax subl$48, %eax cmpb$9, %al setbe %al movzbl %al, %eax ret (The patch didn't affect things here.) FWIW, change the testcase to: int foo (unsigned char *x) { return (((unsigned int) *x - 48) & 0x1ff) < 10; } and we keep the redundant AND, again regardless of whether the patch is applied. >> Well, I think making the simplify-rtx code conditional on the target >> would be the wrong way to go. If we really can't live with it being >> unconditional then I think we should revert it. But like I say I think >> it would be better to make combine recognise the redundancy even with >> the new form. (Or as I say, longer term, not to rely on combine to >> eliminate redundant extensions.) But I don't have time to do that myself... > > It helps x86 so we won't revert it. My fear is that we'll need to add > code in other places to RISCify back the result of this > "simplification". But that's the problem with trying to do the optimisation in this way. We first simplify a truncation of an SImode addition
Re: Libbacktrace backtrace_vector_finish
On Thu, Dec 5, 2013 at 7:32 AM, Jakub Jelinek wrote: > > I'm trying to understand how the backtrace_vector_* APIs are meant to work > and used, but at least for alloc.c don't see how it can work properly: > > Both backtrace_vector_grow and backtrace_vector_release use > base = realloc (vec->base, alc); > or > vec->base = realloc (vec->base, vec->size); > (note, in the latter case it is even a memory leak if realloc fails), > but that assumes that that vec->base has been returned by malloc/realloc > etc. But, > void > backtrace_vector_finish (struct backtrace_state *state ATTRIBUTE_UNUSED, > struct backtrace_vector *vec) > { > vec->base = (char *) vec->base + vec->size; > vec->size = 0; > } > will change vec->base so that it no longer is an address returned by > malloc/realloc, so next time you call backtrace_vector_grow, if it will > actually need to reallocate anything, it will crash in realloc or silently > misbehave. If this works properly in mmap.c implementation, perhaps > backtrace_vector_finish in alloc.c should just backtrace_vector_release > and memset (*vec, 0, sizeof (*vec)); ? You're quite right. That was dumb. Thanks for noticing. Fixed with this patch. Committed to mainline and 4.8 branch. Ian 2013-12-05 Ian Lance Taylor * alloc.c (backtrace_vector_finish): Add error_callback and data parameters. Call backtrace_vector_release. Return address base. * mmap.c (backtrace_vector_finish): Add error_callback and data parameters. Return address base. * dwarf.c (read_function_info): Get new address base from backtrace_vector_finish. * internal.h (backtrace_vector_finish): Update declaration. Index: dwarf.c === --- dwarf.c (revision 205711) +++ dwarf.c (working copy) @@ -2535,19 +2535,23 @@ read_function_info (struct backtrace_sta if (pfvec->count == 0) return; - addrs = (struct function_addrs *) pfvec->vec.base; addrs_count = pfvec->count; if (fvec == NULL) { if (!backtrace_vector_release (state, &lvec.vec, error_callback, data)) return; + addrs = (struct function_addrs *) pfvec->vec.base; } else { /* Finish this list of addresses, but leave the remaining space in the vector available for the next function unit. */ - backtrace_vector_finish (state, &fvec->vec); + addrs = ((struct function_addrs *) + backtrace_vector_finish (state, &fvec->vec, + error_callback, data)); + if (addrs == NULL) + return; fvec->count = 0; } Index: internal.h === --- internal.h (revision 205711) +++ internal.h (working copy) @@ -233,13 +233,17 @@ extern void *backtrace_vector_grow (stru struct backtrace_vector *vec); /* Finish the current allocation on VEC. Prepare to start a new - allocation. The finished allocation will never be freed. */ + allocation. The finished allocation will never be freed. Returns + a pointer to the base of the finished entries, or NULL on + failure. */ -extern void backtrace_vector_finish (struct backtrace_state *state, - struct backtrace_vector *vec); +extern void* backtrace_vector_finish (struct backtrace_state *state, + struct backtrace_vector *vec, + backtrace_error_callback error_callback, + void *data); -/* Release any extra space allocated for VEC. Returns 1 on success, 0 - on failure. */ +/* Release any extra space allocated for VEC. This may change + VEC->base. Returns 1 on success, 0 on failure. */ extern int backtrace_vector_release (struct backtrace_state *state, struct backtrace_vector *vec, Index: mmap.c === --- mmap.c (revision 205711) +++ mmap.c (working copy) @@ -230,12 +230,19 @@ backtrace_vector_grow (struct backtrace_ /* Finish the current allocation on VEC. */ -void -backtrace_vector_finish (struct backtrace_state *state ATTRIBUTE_UNUSED, - struct backtrace_vector *vec) +void * +backtrace_vector_finish ( + struct backtrace_state *state ATTRIBUTE_UNUSED, + struct backtrace_vector *vec, + backtrace_error_callback error_callback ATTRIBUTE_UNUSED, + void *data ATTRIBUTE_UNUSED) { + void *ret; + + ret = vec->base; vec->base = (char *) vec->base + vec->size; vec->size = 0; + return ret; } /* Release any extra space allocated for VEC. */ Index: alloc.c === --- alloc.c (revision 205711) +++ alloc.c (working copy) @@ -113,12 +113,24 @@ backtrace_vector_grow (struct backtrace_ /* Finish the current allocation on VEC. */ -void -backtrace_vector_finish (struct backtrace_state *state ATTRIBUTE_UNUSED, - struct backtrace_vector *vec) +void * +backtrace_vector_finish (struct backtrace_state *state, + struct ba
Re: Dependency confusion in sched-deps
On 05-Dec-13 02:39 AM, Maxim Kuvyrkov wrote: Dependency type plays a role for estimating costs and latencies between instructions (which affects performance), but using wrong or imprecise dependency type does not affect correctness. On multi-issue architectures it does make a difference. Anti dependence permits the two instructions to be issued during the same cycle whereas true dependency and output dependency would forbid this. Or am I misinterpreting your comment?
gcc-4.8-20131205 is now available
Snapshot gcc-4.8-20131205 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20131205/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch revision 205719 You'll find: gcc-4.8-20131205.tar.bz2 Complete GCC MD5=c5f3079d76068b3d2a89356c278ef4cd SHA1=b5e77ad4395561ac3f13f3a635ed5d704bab3786 Diffs from 4.8-20131128 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Dependency confusion in sched-deps
On 6/12/2013, at 4:25 am, Michael Matz wrote: > Hi, > > On Thu, 5 Dec 2013, Maxim Kuvyrkov wrote: > >> Output dependency is the right type (write after write). Anti >> dependency is write after read, and true dependency is read after write. >> >> Dependency type plays a role for estimating costs and latencies between >> instructions (which affects performance), but using wrong or imprecise >> dependency type does not affect correctness. > > In the context of GCC and the middle ends memory model this statement is > not correct. For some dependency types we're using type based aliasing to > disambiguate, i.e ignore that dependency, which for others we don't. In > particular a read-after-write memory-access dependency can be ignored if > type info says they can't alias (because a program where both _would_ > access the same memory would be invalid according to our mem model), but > for write-after-read or write-after-write we cannot do that disambiguation > (because the last write overrides the dynamic type of the memory cell even > if it was incompatible with the one before). Yes, this is correct for dependencies between memory locations in the general context of GCC. [Below clarifications are for Paolo's benefit and anyone else's who wants to find out how GCC scheduling works.] Scheduler dependency analysis is a user of the aforementioned alias analysis and it simply won't create a dependency between instructions if alias analysis tells it that it is OK to do so. In the context of scheduler, the dependencies (and their types) are between instructions, not individual registers or memory locations. The mere fact of two instructions having a dependency of any kind will make the scheduler produce correct code. The difference between two instructions having true vs anti vs output dependency will manifest itself in how close the 2nd instruction will be issued to the 1st one. Furthermore, when two instructions have dependencies on several items (e.g., both on register and on memory location), the resulting dependency type is set to the greater of dependency types of all dependent items: true-dependency having most weight, followed by anti-dependency, followed by output-dependency. Consider instructions [r1] = r2 r1 = [r2] The scheduler dependency analysis will find an anti-dependency on r1 and true-dependency on memory locations (assuming [r1] and [r2] may alias). The resulting dependency between instructions will be true-dependency and the instructions will be scheduled several cycles apart. However, one might argue that [r1] and [r2] are unlikely to alias and scheduling these instructions back-to-back (downgrading dependency type from true to anti) would produce better code on average. This is one of countless improvements that could be made to GCC scheduler. -- Maxim Kuvyrkov www.kugelworks.com
Re: Dependency confusion in sched-deps
On 6/12/2013, at 8:44 am, shmeel gutl wrote: > On 05-Dec-13 02:39 AM, Maxim Kuvyrkov wrote: >> Dependency type plays a role for estimating costs and latencies between >> instructions (which affects performance), but using wrong or imprecise >> dependency type does not affect correctness. > On multi-issue architectures it does make a difference. Anti dependence > permits the two instructions to be issued during the same cycle whereas true > dependency and output dependency would forbid this. > > Or am I misinterpreting your comment? On VLIW-flavoured machines without resource conflict checking -- "yes", it is critical not to use anti dependency where an output or true dependency exist. This is the case though, only because these machines do not follow sequential semantics for instruction execution (i.e., effects from previous instructions are not necessarily observed by subsequent instructions on the same/close cycles. On machines with internal resource conflict checking having a wrong type on the dependency should not cause wrong behavior, but "only" suboptimal performance. Thank you, -- Maxim Kuvyrkov www.kugelworks.com
Re: Hmmm, I think we've seen this problem before (lto build):
On Mon, Dec 02, 2013 at 12:16:18PM +0100, Richard Biener wrote: > On Sun, Dec 1, 2013 at 12:30 PM, Toon Moene wrote: > > http://gcc.gnu.org/ml/gcc-testresults/2013-12/msg1.html > > > > FAILED: Bootstrap (build config: lto; languages: fortran; trunk revision > > 205557) on x86_64-unknown-linux-gnu > > > > In function 'release', > > inlined from 'release' at /home/toon/compilers/gcc/gcc/vec.h:1428:3, > > inlined from '__base_dtor ' at > > /home/toon/compilers/gcc/gcc/vec.h:1195:0, > > inlined from 'compute_antic_aux' at > > /home/toon/compilers/gcc/gcc/tree-ssa-pre.c:2212:0, > > inlined from 'compute_antic' at > > /home/toon/compilers/gcc/gcc/tree-ssa-pre.c:2493:0, > > inlined from 'do_pre' at > > /home/toon/compilers/gcc/gcc/tree-ssa-pre.c:4738:23, > > inlined from 'execute' at > > /home/toon/compilers/gcc/gcc/tree-ssa-pre.c:4818:0: > > /home/toon/compilers/gcc/gcc/vec.h:312:3: error: attempt to free a non-heap > > object 'worklist' [-Werror=free-nonheap-object] > >::free (v); > >^ > > lto1: all warnings being treated as errors > > make[4]: *** [/dev/shm/wd26755/cczzGuTZ.ltrans13.ltrans.o] Error 1 > > make[4]: *** Waiting for unfinished jobs > > lto-wrapper: make returned 2 exit status > > /usr/bin/ld: lto-wrapper failed > > collect2: error: ld returned 1 exit status > > Yes, I still see this - likely caused by IPA-CP / partial inlining and a > "bogus" > warning for unreachable code. I'm really sorry about long delay here, I took a week off for thanksgiving then was pretty busy with other stuff :/ If I remove the now useless worklist.release (); on line 2211 of tree-ssa-pre.c lto bootstrap gets passed this issue to a stage 2 / 3 comparison failure. However doing that also causes these two test failures in a normal bootstrap / regression test cycle Tests that now fail, but worked before: unix/-m32: 17_intro/headers/c++200x/stdc++.cc (test for excess errors) unix/-m32: 17_intro/headers/c++200x/stdc++_multiple_inclusion.cc (test for excess errors) both of these failures are because of this ICE Executing on host: /tmp/tmp.rsz07gSDni/test-objdir/./gcc/xg++ -shared-libgcc -B/tmp/tmp.rsz07gSDni/test-objdir/./gcc -nostdinc++ -L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src -L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/.libs -B/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/bin/ -B/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/lib/ -isystem /tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/include -isystem /tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/sys-include -m32 -B/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -fdiagnostics-color=never -D_GLIBCXX_ASSERT -fmessage-length=0 -ffunction-sections -fdata-sections -g -O2 -D_GNU_SOURCE -g -O2 -D_GNU_SOURCE -DLOCALEDIR="." -nostdinc++ -I/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/include -I/tmp/tmp.rsz07gSDni/libstdc++-v3/libsupc++ -I/tmp/tmp.rsz07gSDni/libstdc++-v3/include/backward -I/tmp/tmp.rsz07gSDni/libstdc++-v3/testsuite/util /tmp/tmp.rsz07gSDni/libstdc++-v3/testsuite/17_intro/headers/c++200x/stdc++_multiple_inclusion.cc -std=gnu++0x -S -m32 -o stdc++_multiple_inclusion.s(timeout = 600) spawn /tmp/tmp.rsz07gSDni/test-objdir/./gcc/xg++ -shared-libgcc -B/tmp/tmp.rsz07gSDni/test-objdir/./gcc -nostdinc++ -L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src -L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/.libs -B/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/bin/ -B/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/lib/ -isystem /tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/include -isystem /tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/sys-include -m32 -B/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -fdiagnostics-color=never -D_GLIBCXX_ASSERT -fmessage-length=0 -ffunction-sections -fdata-sections -g -O2 -D_GNU_SOURCE -g -O2 -D_GNU_SOURCE -DLOCALEDIR="." -nostdinc++ -I/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/include -I/tmp/tmp.rsz07gSDni/libstdc++-v3/libsupc++ -I/tmp/tmp.rsz07gSDni/libstdc++-v3/include/backward -I/tmp/tmp.rsz07gSDni/libstdc++-v3/testsuite/util /tmp/tmp.rsz07gSDni/libstdc++-v3/testsuite/17_intro/headers/c++200x/stdc++_multiple_inclusion.cc -std=gnu++0x -S -m32 -o stdc++_multiple_inclusion.s^M cc1plus: internal compiler error: Segmentation fault^M 0xb8745f crash_signal^M ../..