[Bug c/63303] Pointer subtraction is broken when using -fsanitize=undefined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63303 Joshua Green changed: What|Removed |Added CC||jvg1981 at aim dot com --- Comment #17 from Joshua Green --- "But if we don't know which pointer is greater, it gets more complicated: ..." I'm not sure that this is true. For types that are larger than 1 byte, it seems that one can do the subtraction after any division(s), hence only costing an additional division (or shift): T * p; T * q; . . . intptr_t pVal = ((uintptr_t) p)/(sizeof *p); intptr_t qVal = ((uintptr_t) q)/(sizeof *q); ptrdiff_t p_q = pVal - qVal; This should work in well-defined cases, for if p and q are pointers into the same array then (presumably) ((uintptr_t) p) and ((uintptr_t) q) must have the same remainder modulo sizeof(T). Of course, even an additional shift may be too expensive in some cases, so it's not entirely clear that this change should be made.
[Bug c/63303] Pointer subtraction is broken when using -fsanitize=undefined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63303 --- Comment #20 from Joshua Green --- > "But if we don't know which pointer is greater, it gets more complicated: > ..." > > I'm not sure that this is true. For types that are larger than 1 byte, it > seems that one can do the subtraction after any division(s), hence only > costing an additional division (or shift): > > T * p; > T * q; > > . > . > . > > intptr_t pVal = ((uintptr_t) p)/(sizeof *p); > intptr_t qVal = ((uintptr_t) q)/(sizeof *q); > > ptrdiff_t p_q = pVal - qVal; > > This should work in well-defined cases, for if p and q are pointers into the > same array then (presumably) ((uintptr_t) p) and ((uintptr_t) q) must have > the same remainder modulo sizeof(T). > > Of course, even an additional shift may be too expensive in some cases, so > it's not entirely clear that this change should be made. It occurred to me that such contortions can be avoided in the (possibly) common case when (say) q is actually an array.
[Bug tree-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573 jvg1981 at aim dot com changed: What|Removed |Added CC||jvg1981 at aim dot com --- Comment #5 from jvg1981 at aim dot com --- I recently came across this surprising behavior. Has anyone taken a serious look at it? Is it likely to be corrected/changed?
[Bug tree-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573 --- Comment #7 from Joshua Green --- (In reply to Segher Boessenkool from comment #6) > bb-reorder changes the conditional branch so that the fallthrough path > is the most likely. It now also does this for -O1. This is faster on > essentially all processors, including the ones the OP mentions. > > Without profiling information showing otherwise, GCC assumes the call > to bar2 is more frequent than the one to bar1 (61% vs. 39%). This is > a heuristic, it might need retuning, but that needs a lot more data > than this one testcase. > > Closing as invalid. While I agree that this isn't really a bug, I find the above reasoning hard to follow. The compiler could treat the original foo as if (i) { bar1(); } else { bar2(); } or if (!i) { bar2(); } else { bar1(); } and I see no reason why expecting the "else" block should a priori be preferable in either case. (It's also not clear HOW this could be "faster on essentially all processors" in either case, though I'm open to being corrected and/or enlightened on this subject.) Of course, the compiler is free to make whatever guess it wants, but it would be nice if the programmer had some portable way of expressing his/her own expectations, and it seems that other compilers provide that by "agreeing" to expect the "if" block (as, indeed, various online articles recommend).
[Bug tree-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573 --- Comment #9 from Joshua Green --- (In reply to Segher Boessenkool from comment #8) > GCC does some fairly involved prediction (in predict.c). It isn't > "a priori". > > > (It's also not clear HOW this could be "faster > > on essentially all processors" > > Fall-through is faster than branching in most cases. Most CPUs have > some kind of pipelining on instruction fetch. > This is the point on which I'm confused. I understand that fall through is faster than branching, that it's good to keep the pipeline running smoothly. It seems to me, though, that in this case the compiler has complete freedom in deciding which function call (bar1() or bar2()) is in the "fall through case" and which is in the "branching case." Why not make the same choice as other compilers do (and documentation recommends, and O0 does [, and O1 used to do?]) by replacing the above O2-O3 code with foo(bool): testb %dil, %dil je .L4 jmp bar1() .L4: jmp bar2() ?
[Bug tree-optimization/66573] Unexpected change in static, branch-prediction cost from O1 to O2 in if-then-else.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66573 --- Comment #11 from Joshua Green --- (In reply to Segher Boessenkool from comment #10) > GCC thinks bar2 will be executed more often that bar1; the code > it generates is perfectly fine for that. > > If you think GCC's heuristics for branch prediction are no good, > could use some improvement, you'll have to come up with more > evidence than just a single artificial testcase. Sorry. These > things were tuned on real code. If gcc's heuristic is indeed optimal when tested over a reasonable sample of real code, then I withdraw my objection.