[Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092 --- Comment #3 from Marc Glisse --- Does profile feedback (so we have an idea on the loop count) make any difference? It seems clear that for a loop that in practice just copies one long, having to arrange the arguments, make a function call, test for alignment, etc, is a lot of overhead. What to do about it though...
[Bug c++/94141] New: c++20 rewritten operator== recursive call mixing friend and external operators for template class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94141 Bug ID: 94141 Summary: c++20 rewritten operator== recursive call mixing friend and external operators for template class Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- (reduced from a user of boost/operators.hpp) template class A; template bool operator==(const A&, int) { return false; } template class A { friend bool operator==(int y, const A& x) { return x == y; } }; int main(){ A q; q==3; 3==q; } $ g++ -std=c++2a a.c -Wall && ./a.out a.c: In instantiation of 'bool operator==(int, const A&)': a.c:10:6: required from here a.c:5:56: warning: in C++20 this comparison calls the current function recursively with reversed arguments 5 | friend bool operator==(int y, const A& x) { return x == y; } | ~~^~~~ zsh: segmentation fault ./a.out If I make both operators friends, or move both outside, gcc is happy, but in this mixed case, it doesn't seem to want to use the first operator== and prefers the rewritten second operator==. Of course removing the second operator== completely also works. Clang is fine with this version of the code. I have trouble parsing the standard wording, but IIRC one of the principles when adding <=> was that explicitly written functions should have priority over new, invented ones. Bug 93807 is the closest I could find.
[Bug tree-optimization/61338] too many permutation in a vectorized "reverse loop"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61338 --- Comment #3 from Marc Glisse --- Possibly easier is the case of a reduction, where permutations are clearly irrelevant. int f(int*arr,int size){ int sum=0; for(int i = 0; i < size; i++){ sum += arr[size-1-i]; } return sum; } We still have a VEC_PERM_EXPR in the hot loop before accumulating. (by the way, we accumulate in a variable of type "vector(4) int", while I would expect "vector(4) unsigned int" for overflow reasons)
[Bug target/94194] x86: Provide feraiseexcept builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94194 --- Comment #1 from Marc Glisse --- Is there a convenient way for gcc to know the value of FE_DIVBYZERO, etc on the target? Do we need to hardcode it? Can we rely on different libc on the same processor to use the same value? What happens if the user calls feraiseexcept and compiles with -fno-trapping-math?
[Bug tree-optimization/94234] missed ccp folding for (addr + 8 * n) - (addr + 8 * (n - 1))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94234 --- Comment #2 from Marc Glisse --- The closest we have is /* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1). which does not handle conversions, although it should be possible to add them.
[Bug tree-optimization/94274] fold phi whose incoming args are defined from binary operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94274 --- Comment #1 from Marc Glisse --- Detecting common beginnings / endings in branches is something gcc does very seldom. Even at -Os, for if(cond)f(b);else f(c); we need to wait until rtl-optimizations to get a single call to f. (of course the reverse transformation of duplicating a statement that was after the branches into them, if it simplifies, is nice as well, and they can conflict) I don't know if handling one such very specific case (binary operations with a common argument) separately is a good idea when we don't even handle unary operations.
[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293 --- Comment #1 from Marc Glisse --- Adding inline void* operator new(std::size_t n){return __builtin_malloc(n);} inline void operator delete(void*p)noexcept{__builtin_free(p);} inline void operator delete(void*p,std::size_t)noexcept{__builtin_free(p);} lets gcc optimize. Without it, we end up with _37 = operator new (51); __builtin_memcpy (_37, "Hey... no small-string optimization for me please!", 50); MEM[(char_type &)_37 + 50] = 0; operator delete (_37, 51); return 123; I expect DSE (via tree-ssa-alias.c) doesn't know about delete the way it knows about free and thus doesn't see the stores as dead.
[Bug tree-optimization/94294] [missed optimization] new+delete of unused local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94294 --- Comment #2 from Marc Glisse --- (In reply to Eyal Rozenberg from comment #0) > Note: I suppose it's theoretically possible that this bug only manifests > because bug 94293 prevents the allocated space from being recognized as > unused; but I can't tell whether that's the case. Pretty sure that's the case, gcc removes new/delete pairs now when nothing writes to that memory.
[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293 --- Comment #4 from Marc Glisse --- Or just void f(){ int*p=new int[1]; *p=42; delete[] p; } while it does optimize for void f(){ int*p=new int; *p=42; delete p; } because the front-end gives us a clobber before operator delete.
[Bug libstdc++/94295] use __builtin_operator_new and __builtin_operator_delete when available
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94295 --- Comment #1 from Marc Glisse --- (In reply to Richard Smith from comment #0) > The C++ language rules do not permit optimization (eg, deletion) of direct > calls to 'operator new' and 'operator delete'. I thought that was considered a bug? Gcc does optimize those, like it does malloc/free... > This bug requests that libstdc++ uses these builtins when available. So just in std::allocator, or are there other places? > (Separately, it'd be great if GCC considered supporting them too.) IIRC (would need to dig up the conversation), when the optimization for new/delete pairs was added in gcc, the builtin option was rejected.
[Bug tree-optimization/94294] [missed optimization] new+delete of unused local string not removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94294 --- Comment #4 from Marc Glisse --- I don't believe there is a "new/delete" issue.
[Bug libstdc++/94295] use __builtin_operator_new and __builtin_operator_delete when available
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94295 --- Comment #5 from Marc Glisse --- (In reply to Richard Smith from comment #2) > (In reply to Marc Glisse from comment #1) > > (In reply to Richard Smith from comment #0) > > > The C++ language rules do not permit optimization (eg, deletion) of direct > > > calls to 'operator new' and 'operator delete'. > > > > I thought that was considered a bug? > > No, it's intentional: if the user directly calls '::operator new(42)' and > they've replaced that function, the replacement function is guaranteed to be > called. In this regard, 'operator new' is just a regular function with a > funny name. > > To be clear, the implicit call to 'operator new' produced by, say, 'new int' > *is* optimizable, but a direct explicit call to 'operator new(sizeof(int))' > is not. Ah, since you are here, and you appeared as an author of N3664 but not N3537 (precisely when this subtlety happened), could you explain why? It isn't discussed in the paper, complicates the design, and I cannot think of any use for this distinction (there are workarounds if people don't want their explicit call elided). This of course doesn't at all prevent from adding a __builtin_operator_new option in std::allocator, it only affects how motivated we should be to fix the non-conformance.
[Bug c++/94314] New: [10 Regression] Optimizing mismatched new/delete pairs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94314 Bug ID: 94314 Summary: [10 Regression] Optimizing mismatched new/delete pairs Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- (originally posted at https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00276.html , I don't know if we will do something about it, but it seems worth documenting it in bugzilla) Now that we optimize class-specific operator new/delete pairs (but you could do the same with the global replacable ones as well): #include int count = 0; struct A { __attribute__((malloc,noinline)) static void* operator new(unsigned long sz){++count;return ::operator new(sz);} static void operator delete(void* ptr){--count;::operator delete(ptr);} }; int main(){ delete new A; printf("%d\n",count); // Should print 0. } If we do not inline anything, we can remove the pair and nothing touches count. If we inline both new and delete, we can then remove the inner pair instead, count increases and decreases, fine. If we inline only one of them, and DCE the mismatched pair new/delete, we get something inconsistent (count is -1). This seems to indicate we should check that the new and delete match somehow...
[Bug c++/94314] [10 Regression] Optimizing mismatched new/delete pairs since r10-2106-g6343b6bf3bb83c87
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94314 --- Comment #5 from Marc Glisse --- I don't think we need heavy machinery linking new and delete (and if we did I'd be tempted to store it in some global table rather than in the nodes). The most important case is the global replacable functions, for which we have a finite list, and for those a few checks like not matching array with non-array versions should do. For user overloads with attribute malloc (a gcc extension), I would go with heuristics like both/neither being class members, being members of the same class, etc. Although I am not quite sure how doable that is from the middle-end, how much of that information is still available (I think it is available in the mangled name, but demangling doesn't seem like a great idea).
[Bug libstdc++/94295] use __builtin_operator_new and __builtin_operator_delete when available
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94295 Marc Glisse changed: What|Removed |Added Status|UNCONFIRMED |NEW Keywords||missed-optimization Last reconfirmed||2020-03-26 Severity|normal |enhancement Ever confirmed|0 |1
[Bug tree-optimization/94356] Missed optimisation: useless multiplication generated for pointer comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94356 --- Comment #1 from Marc Glisse --- That's because internally we use an unsigned type for offsets (including for the multiplication). There have been tries to change that...
[Bug tree-optimization/94356] Missed optimisation: useless multiplication generated for pointer comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94356 --- Comment #3 from Marc Glisse --- I tried https://gcc.gnu.org/pipermail/gcc-patches/2017-May/475037.html some time ago. Having a different type for the multiplication and the offsetting introduced a lot of NOPs and caused a few regressions (from my notes, pta-ptrarith-3.c and ssa-pre-8.c were just testsuite issues, but there were others). I filed a bug or 2 about those (PR 88926 at least). Then I tried to also make the offsets signed, to be consistent, but that's a much bigger piece and I ran out of time and motivation.
[Bug libstdc++/63706] stl_heap.h:make_heap()'s worst time complexity doesn't conform with C++ standard
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63706 Marc Glisse changed: What|Removed |Added Last reconfirmed||2020-03-29 Status|UNCONFIRMED |WAITING Ever confirmed|0 |1 --- Comment #1 from Marc Glisse --- Trying exactly the construction described here with g++-4.9: #include #include #include int count=0; bool cmp(int a, int b){++count;return a v; const int n=1; for(int i=0;i
[Bug libstdc++/51965] Redundant move constructions in heap algorithms
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51965 --- Comment #19 from Marc Glisse --- (In reply to Jonathan Wakely from comment #16) > (In reply to Marc Glisse from comment #5) > > (The split into push_heap and __push_heap is just so the first part can be > > inlined without the second, right?) > > > > A more direct adaptation of the old code to rvalue references would be: > > > > std::__push_heap(__first, _DistanceType((__last - __first) - 1), > >_DistanceType(0), _ValueType(_GLIBCXX_MOVE(*(__last - > > 1; > > I tried doing this and it didn't seem to help the testcase attached here. push_heap(): default_ctors=0, copy_ctors=0, copy_assignments=0, swaps=0, [-cheap_dtors=1998,-] {+cheap_dtors=999,+} expensive_dtors=0, [-move_ctors=1998,-] {+move_ctors=999,+} cheap_move_assignments=2201, expensive_move_assignments=0, comparisons=2196 It doesn't help the other operations, but it has some effect on this one.
[Bug middle-end/94412] wrong code with -fsanitize=undefined and vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94412 Marc Glisse changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-03-30 Status|UNCONFIRMED |NEW --- Comment #1 from Marc Glisse --- Actually it seems to me that the code is only right with -fsanitize=undefined, it has to abort. We replace -v/11u by v/-11u because in fold-const.c we check: if ((!INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_UNDEFINED (type)) instead of ANY_INTEGRAL_TYPE_P for instance.
[Bug c++/94141] c++20 rewritten operator== recursive call mixing friend and external operators for template class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94141 --- Comment #1 from Marc Glisse --- It looks like clang-10+ also generates an infinite loop on this code. Does the standard really give priority to some implicit function over a user-defined one that is an exact match?
[Bug c++/94141] c++20 rewritten operator== recursive call mixing friend and external operators for template class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94141 --- Comment #2 from Marc Glisse --- Ah, maybe the friend function is not quite a template, so the generated swapped function is not a template either, and thus it has priority over a template if both are exact matches? This is going to break a number of users of boost/operators.hpp, and possibly other mixins using a similar technique.
[Bug middle-end/62080] Suboptimal code generation with eigen library
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62080 Marc Glisse changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |WAITING Last reconfirmed||2020-04-06 --- Comment #8 from Marc Glisse --- Even with gcc-4.8.4 (the oldest I have), I cannot reproduce the original report. Maybe Eigen changed since then. That's why we ask for self-contained testcases (possibly just the preprocessed source code).
[Bug c++/94314] [10 Regression] Optimizing mismatched new/delete pairs since r10-2106-g6343b6bf3bb83c87
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94314 --- Comment #10 from Marc Glisse --- I am still getting -1 at -O2 for #include #include int count = 0; __attribute__((malloc,noinline)) void* operator new[](unsigned long sz){++count;return ::operator new(sz);} void operator delete[](void* ptr)noexcept{--count;::operator delete(ptr);} void operator delete[](void* ptr, std::size_t sz)noexcept{--count;::operator delete(ptr, sz);} int main(){ delete[] new int[1]; printf("%d\n",count); // Should print 0. } I am not aware of any code that breaks in practice, but it still looks strange.
[Bug tree-optimization/94566] New: conversion between std::strong_ordering and int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94566 Bug ID: 94566 Summary: conversion between std::strong_ordering and int Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- #include int conv1(std::strong_ordering s){ if(s==std::strong_ordering::less) return -1; if(s==std::strong_ordering::equal) return 0; if(s==std::strong_ordering::greater) return 1; __builtin_unreachable(); } std::strong_ordering conv2(int i){ switch(i){ case -1: return std::strong_ordering::less; case 0: return std::strong_ordering::equal; case 1: return std::strong_ordering::greater; default: __builtin_unreachable(); } } Compiling with -std=gnu++2a -O3. I would like the compiler to notice that those are just NOP (at most a sign-extension). Clang manages it for conv2. Gcc generates: movl$-1, %eax cmpb$-1, %dil je .L1 xorl%eax, %eax testb %dil, %dil setne %al .L1: ret and xorl%eax, %eax testl %edi, %edi je .L10 cmpl$1, %edi sete%al leal-1(%rax,%rax), %eax .L10: ret (apparently the C++ committee thinks it is a good idea to provide a type that is essentially an int that can only be -1, 0 or 1, but not provide any direct way to convert to/from int)
[Bug tree-optimization/94589] New: Optimize (i<=>0)>0 to i>0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589 Bug ID: 94589 Summary: Optimize (i<=>0)>0 to i>0 Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- g++-10 -std=gu++2a -O3 #include bool k(int i){ auto c=i<=>0; return c>0; } [local count: 1073741824]: if (i_1(D) != 0) goto ; [50.00%] else goto ; [50.00%] [local count: 536870913]: _2 = i_1(D) >= 0; [local count: 1073741824]: # prephitmp_6 = PHI <_2(3), 0(2)> return prephitmp_6; For most comparisons @ we do optimize (i<=>0)@0 to just i@0, but not for > and <=. Spaceship operator<=> is very painful to use, but I expect we will end up seeing a lot of it with C++20, and comparing its result with 0 is almost the only way to use its output, so it seems important to optimize this common case. (there is probably a very old dup, but I couldn't find it)
[Bug tree-optimization/94566] conversion between std::strong_ordering and int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94566 --- Comment #3 from Marc Glisse --- I thought we had code to recognize a switch that represents a linear function, I was hoping that it would kick in with your hoisting patch...
[Bug rtl-optimization/94798] Failure to optimize subtraction and 0 literal properly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94798 --- Comment #1 from Marc Glisse --- (In reply to Gabriel Ravier from comment #0) > Comparison here : https://godbolt.org/z/LZ8dBy In your future bug reports, could you please copy all relevant information instead of (or in addition to) linking to some external website just to show 3 lines of asm? Thanks.
[Bug tree-optimization/94801] Failure to optimize narrowed __builtin_clz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94801 --- Comment #1 from Marc Glisse --- Gcc considers that clz might return 32 on some platforms, it does not currently use target-specific information to restrict the range of clz output.
[Bug tree-optimization/94801] Failure to optimize narrowed __builtin_clz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94801 --- Comment #2 from Marc Glisse --- if(a==0)__builtin_unreachable(); lets gcc optimize the code.
[Bug libstdc++/94811] Please make make_tuple noexcept when possible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94811 Marc Glisse changed: What|Removed |Added Component|c++ |libstdc++ Severity|normal |enhancement --- Comment #1 from Marc Glisse --- Sure, it is possible, but isn't std::make_tuple mostly legacy at this point, with CTAD you can just use std::tuple, which is noexcept already. Each extra noexcept is one more chance to get things wrong, at least as long as wg21 refuses noexcept(auto), although this particular case doesn't seem particularly hard.
[Bug rtl-optimization/94804] Failure to elide useless movs in 128-bit addition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804 Marc Glisse changed: What|Removed |Added Keywords||ra --- Comment #2 from Marc Glisse --- Gcc's register allocation is not well optimized for hard registers at function boundaries (inlining makes this case not very important), there are several related bug reports. It would be nice to improve that, but it is likely to get lower priority than if you can find a similar issue in the middle of a hot loop.
[Bug rtl-optimization/94804] Failure to elide useless movs in 128-bit addition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804 --- Comment #4 from Marc Glisse --- (In reply to Gabriel Ravier from comment #3) > Having similar problems with useless movs is from the same non > well-optimized register allocation on function boundaries ? I don't know, but possibly not. I'll shut up because I am not a RA specialist... (and if you expect to see it optimized to bswap64, then obviously it is unrelated to register allocation)
[Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908 --- Comment #1 from Marc Glisse --- Even if we write __builtin_shuffle, the vector lowering pass turns it into the same code (constructor of BIT_FIELD_REFs), which seems to indicate that the target does not handle this pattern.
[Bug tree-optimization/94911] Failure to optimize comparisons of VLA sizes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94911 --- Comment #1 from Marc Glisse --- gcc computes sizeof(a) as 4ul*(size_t)n, and unsigned types don't provide nice overflow guarantees, so that complicates things.
[Bug tree-optimization/94911] Failure to optimize comparisons of VLA sizes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94911 --- Comment #3 from Marc Glisse --- Since VLA is an extension for compatibility with C, it is strange that it behaves differently (does one use the value of n at the time of the typedef and the other at the time of the declaration?). This bug is about the optimization, maybe a separate report about the C++ behavior would make sense.
[Bug c++/94905] Bogus warning -Werror=maybe-uninitialized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94905 --- Comment #1 from Marc Glisse --- Several of us asked, and it was rejected. Your next step is to provide a self-contained testcase (preprocessed sources?). You may also want to check if it still warns in gcc-10.
[Bug tree-optimization/94919] Failure to recognize max pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94919 --- Comment #1 from Marc Glisse --- This seems related to another one you reported, in the category: i&-b == b?i:0 (for b∈{0,1}). The first form has the advantage of no branch, while the second is less obfuscated and simplifies more naturally (like when we combine x<0||y<0 to (x|y)<0 we get something shorter, but more obfuscated which can hinder other optimizations). Here this equivalence gives the intermediate step of (x>=y?x^y:0)^y. I didn't see if you posted it somewhere, but I assume those tests come from the llvm testsuite? Did people really hit such weird code in the wild and add them one by one to llvm? Or was there some automated process involved ? I could imagine looking at all the expressions of a certain size using some list of operators and only 2 variables (and maybe a few constants), compute the table of values for a small type (a 3-bit integer?), put them in a hash table, and study collisions? Generating completely automatically the list of transformations may be a bit hard though. And handling undefined cases (like signed overflow) complicates things, since we are not looking for exact matches there.
[Bug tree-optimization/94921] Failure to optimize nots with sub into single add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94921 --- Comment #1 from Marc Glisse --- x + y ?
[Bug tree-optimization/94930] Failure to optimize out subvsi in expansion of __builtin_memcmp with 1 as the operand with -ftrapv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94930 --- Comment #1 from Marc Glisse --- AFAIK -ftrapv doesn't work very well and is kind of abandoned, in favor of -fsanitize=signed-integer-overflow (possibly with -fsanitize-undefined-trap-on-error), which does generate the code you expect.
[Bug tree-optimization/94914] Failure to optimize check of high part of 64-bit result of 32 by 32 multiplication into overflow check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94914 --- Comment #4 from Marc Glisse --- I thought we might already simplify (u >> 32) != 0 to u >= cst (other possible forms are u != (uint64_t)(uint32_t)u, u & cst != 0, etc, I am trying to think which one looks most canonical). I expect in interesting cases the code will use z = (uint64_t)x * y twice, once to check for overflow, and once as (uint32_t)z to get the actual result (or there is a separate x*y and we want to CSE it with the overflow version).
[Bug tree-optimization/95001] std::terminate() and abort() do not have __builtin_unreachable() semantics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95001 --- Comment #3 from Marc Glisse --- Simpler example: [[noreturn]] void theend(); int f(int x){ if(x&7)theend(); return x&3; } (or replace "theend()" with "throw 42") We shouldn't compute x&3, it is always 0 in the branch where it is computed. But this simple example would probably be doable similarly to VRP, while the original example requires that the information be available during another pass (on-demand non-zero bits, like there is work on on-demand ranges?).
[Bug c/95044] [10/11 Regression] -Wreturn-local-addr false alarm since r10-1741-gaac9480da1ffd037
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95044 --- Comment #1 from Marc Glisse --- I think there is another very similar bug report. # buf_1 = PHI <&stack_buf(2), buf_15(6)> [...] if (&stack_buf != buf_1) in each branch, we thus know what buf_1 is, so we could replace it with buf_15 in # _3 = PHI <_17(5), buf_1(4)> return _3; (or is that bad for register pressure?) Or use it as a hint to thread that path, or add some logic to Wreturn_local_addr, but that's getting more complicated.
[Bug libstdc++/95065] Remove std::bind1st and std::bind2nd when building in -std=c++17
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95065 Marc Glisse changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Marc Glisse --- They are kept on purpose. *** This bug has been marked as a duplicate of bug 91383 ***
[Bug libstdc++/91383] C++17 should remove some library feature deprecated in C++14
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91383 Marc Glisse changed: What|Removed |Added CC||gcc at linkmauve dot fr --- Comment #6 from Marc Glisse --- *** Bug 95065 has been marked as a duplicate of this bug. ***
[Bug target/95115] [10 Regression] RISC-V 64: inf/inf division optimized out, invalid operation not raised
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115 --- Comment #1 from Marc Glisse --- I am seeing the same thing on x86_64, happens during FRE1, so it looks like tree-optimization.
[Bug target/95115] [10 Regression] RISC-V 64: inf/inf division optimized out, invalid operation not raised
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115 --- Comment #2 from Marc Glisse --- Or during CCP with the simpler double f(){ double d=__builtin_inf(); return d/d; } and all the -frounding-math -ftrapping-math -fsignaling-nans don't seem to help.
[Bug target/95115] [10 Regression] RISC-V 64: inf/inf division optimized out, invalid operation not raised
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115 --- Comment #4 from Marc Glisse --- (In reply to Jim Wilson from comment #3) > The assumption here seems to be that if the user is > dividing constants, then we don't need to worry about setting exception > bits. If I write (4.0 / 3.0) for instance, the compiler just folds it and > doesn't worry about setting the inexact bit. We don't fold 0./0. though (unless -fno-trapping-math), it would make sense for Inf/Inf to behave the same. And -frounding-math protects 4./3. > divide gets moved after the fetestexcept call. That looks like a gcc bug Yes, that one is known.
[Bug tree-optimization/95246] Failure to optimize comparison between differently signed chars
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95246 --- Comment #1 from Marc Glisse --- On which version of LLVM did you see that? For me, gcc produces movzbl %dil, %edi movsbl %sil, %esi cmpl%esi, %edi setg%al while clang skips the first 2 lines (but still emits movl), assuming that the input is already signed/zero extended, which points at ABI conventions. The transformation you suggest doesn't seem right to me.
[Bug c++/94141] c++20 rewritten operator== recursive call mixing friend and external operators for template class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94141 Marc Glisse changed: What|Removed |Added Last reconfirmed||2020-05-21 Status|UNCONFIRMED |SUSPENDED Ever confirmed|0 |1 --- Comment #3 from Marc Glisse --- It seems that this is as currently specified in C++20, but that some people are going to try and change the rules to avoid breaking code like this.
[Bug sanitizer/95279] UBSan doesn't seem to detect pointer overflow in certain cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95279 --- Comment #8 from Marc Glisse --- (In reply to Jakub Jelinek from comment #4) > There is nothing wrong on addition of -1, whether signed or cast to > size_t/uintptr_t, to a pointer, Looking at the standard (I am not a pro at that), one could easily interpret that p+(size_t)(-1) means adding a huge number to p, not subtracting 1. It does not say that the integer is cast to ptrdiff_t or anything like that.
[Bug sanitizer/95279] UBSan doesn't seem to detect pointer overflow in certain cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95279 --- Comment #12 from Marc Glisse --- (In reply to Jakub Jelinek from comment #10) > 1 + (size_t) -1 give 0 It wasn't obvious to me that the operation was supposed to happen in some C/C++ type (they don't say which one) or in a mathematical, infinite-precision sense. After all, they write 0≤i−j≤n which shouldn't be interpreted as C++ code. But you have more experience with reading these things, I believe you.
[Bug c++/95351] Comparison with NAN optimizes incorrectly with -ffast-math disabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95351 --- Comment #2 from Marc Glisse --- It might not be the issue, but merge_truthop_with_opposite_arm has a suspicious HONOR_NANS (type) where type is bool: the result of the comparison instead of one of the arguments.
[Bug middle-end/95353] [10/11 Regression] GCC can't build binutils
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95353 --- Comment #3 from Marc Glisse --- Do you need fr_literal to have size at least 1 (say, when creating an object on the stack), or can you use the official flexible array member (drop the 1, just [] in the declaration)?
[Bug tree-optimization/95393] Failure to optimize loop condition arithmetic for mismatched types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95393 --- Comment #1 from Marc Glisse --- It does optimize for me with -O2 or -O3. It could optimize earlier though, by the end of gimple, we are still trying to return max(s,0).
[Bug tree-optimization/95423] Failure to optimize separated multiplications by x and square of x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95423 --- Comment #3 from Marc Glisse --- We manage it with -fwrapv. This should happen late when we don't care about overflow anymore, or it needs to introduce casts to an unsigned type.
[Bug tree-optimization/95433] Failure to completely optimize simple compare after operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95433 Marc Glisse changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2020-05-30 --- Comment #1 from Marc Glisse --- It is 2*x==-2 that we fail to simplify. match.pd has code for x*2==y*2 or x*2==0 or even x*2.==-2. for floats, but apparently not for the special case of other constants for integers.
[Bug target/95435] bad builtin memcpy performance with znver1/znver2 and 32bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95435 Marc Glisse changed: What|Removed |Added Target||x86-*-* --- Comment #3 from Marc Glisse --- "regression" means that a new version of gcc is working worse than an older one, can you mention which older version you are comparing to and what results you were getting with it?
[Bug tree-optimization/95489] Failure to optimize x && (x & y) to x & y
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95489 --- Comment #2 from Marc Glisse --- (In reply to Richard Biener from comment #1) > (bit_and (ne (bit_and x_3 y_4) 0) (ne x_3 0)) This could be simplified > where I'd say we miss > > (bit_and (ne @0 integer_zerop) (ne @1 integer_zerop)) > > -> > > (ne (bit_and @0 @1) integer_zerop) This only seems possible for 1-bit types: 1!=0 & 2!0 is not (1&2)!=0 To me, this falls in the general category of (x!=a)?f(x):y where y happens to be f(a) and f is not as costly as a condition+jump. I handled a few such cases a while ago with neutral_element_p, but it could be much more general (I am not saying it is easy).
[Bug c++/95384] Poor codegen cause by using base class instead of member for Optional construction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95384 --- Comment #2 from Marc Glisse --- Or with less templates: struct A { A() = default; union S { constexpr S() noexcept : e() { } struct {} e; int i; } s; bool b = false; }; struct B : A { B() = default; using A::A; }; B f() { return {}; } movl$0, -12(%rsp) movq-16(%rsp), %rax ret instead of a plain xorl%eax, %eax optimized dump is MEM [(struct B *)&D.2265 + 4B] = {}; MEM[(union S *)&D.2265] ={v} {CLOBBER}; D.2275 = D.2265; D.2265 ={v} {CLOBBER}; return D.2275; expand (insn 5 2 6 2 (set (mem/c:SI (plus:DI (reg/f:DI 77 virtual-stack-vars) (const_int -12 [0xfff4])) [7 MEM [(struct B *)&D.2265 + 4B]+0 S4 A32]) (const_int 0 [0])) "o.cc":14:17 -1 (nil)) (insn 6 5 7 2 (set (reg:DI 83) (mem/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars) (const_int -16 [0xfff0])) [7 D.2265+0 S8 A64])) "o.cc":14:17 -1 (nil)) (insn 7 6 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars) (const_int -8 [0xfff8])) [7 D.2275+0 S8 A64]) (reg:DI 83)) "o.cc":14:17 -1 (nil)) (insn 8 7 9 2 (set (reg:DI 84) (const_int 0 [0])) "o.cc":14:17 -1 (nil)) (insn 9 8 10 2 (set (reg:DI 84) (mem/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars) (const_int -8 [0xfff8])) [7 D.2275+0 S8 A64])) "o.cc":14:17 -1 (nil)) (insn 10 9 11 2 (set (reg:DI 85) (reg:DI 84)) "o.cc":14:17 -1 (nil)) (insn 11 10 15 2 (set (reg:DI 82 [ ]) (reg:DI 85)) "o.cc":14:17 -1 (nil)) (insn 15 11 16 2 (set (reg/i:DI 0 ax) (reg:DI 82 [ ])) "o.cc":14:20 -1 (nil)) (insn 16 15 0 2 (use (reg/i:DI 0 ax)) "o.cc":14:20 -1 (nil)) and dfinish (insn:TI 5 2 15 2 (set (mem/c:SI (plus:DI (reg/f:DI 7 sp) (const_int -12 [0xfff4])) [7 MEM [(struct B *)&D.2265 + 4B]+0 S4 A32]) (const_int 0 [0])) "o.cc":14:17 67 {*movsi_internal} (nil)) (insn 15 5 16 2 (set (reg/i:DI 0 ax) (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int -16 [0xfff0])) [7 D.2265+0 S8 A64])) "o.cc":14:20 66 {*movdi_internal} (nil)) (insn 16 15 25 2 (use (reg/i:DI 0 ax)) "o.cc":14:20 -1 (nil))
[Bug middle-end/4210] should not warn in dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210 --- Comment #43 from Marc Glisse --- (In reply to Niels Möller from comment #42) > And what's the easiest way to run the the right compiler process (I guess > that's cc1) under gdb? gcc -c t.c -wrapper gdb,--args
[Bug libstdc++/95561] std::is_signed_v<__int128> is false
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95561 --- Comment #1 from Marc Glisse --- Are you using -std=gnu++17 or -std=c++17 ?
[Bug tree-optimization/95643] Optimizer fails to realize that a variable tested twice in a row is the same both times
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95643 --- Comment #1 from Marc Glisse --- After FRE1 we have _2 = x_9(D) == 0; if (_2 != 0) so we assert things for _2 and not x_9, and we lose the __builtin_unreachable information in CCP2.
[Bug libstdc++/90436] Redundant size checking in vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90436 --- Comment #2 from Marc Glisse --- (writing down some notes) Calling size_type _M_check_len_one(const char* __s) const { if (max_size() - size() < 1) __throw_length_error(__N(__s)); const size_type __len = size() + (std::max)(size(), (size_t)1); return (__len > max_size()) ? max_size() : __len; } instead of _M_check_len reduces the running time of this micro-benchmark #include int main(){ volatile int a=0; for(int i=0;i<100;++i){ std::vector v; for(int j=0;j<1000;++j){ v.push_back(j); } a=v[a]; } } from .88s to .66s at -O3. Two key elements (the perf gain only comes if we do both) are removing the overflow check, and having the comparison between size and max_size optimized to be done on byte length (not divided by the element size). I think the overflow check could be removed from the normal _M_check_len: we have already checked that max_size() - size() >= __n so size() + __n cannot overflow, and size() must be smaller than max_size(), which should be at most SIZE_MAX/2, at least if ptrdiff_t and size_t have the same size, so size() + size() cannot overflow either. I should check if the compiler could help more. It is supposed to know how to optimize .ADD_OVERFLOW based on the range of the operands. I suspect that a single_use restriction explains why max_size() == size() compares values without division while max_size() - size() < __n (for __n = 1) doesn't.
[Bug libstdc++/90436] Redundant size checking in vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90436 --- Comment #3 from Marc Glisse --- // possibly assumes that ptrdiff_t and size_t have the same size size_type _M_check_len_one(const char* __str) const { ptrdiff_t __n = sizeof(_Tp); ptrdiff_t __ms = max_size(); __ms *= sizeof(_Tp); ptrdiff_t __s = size(); __s *= sizeof(_Tp); if (__s > (__ms - __n)) __throw_length_error(__N(__str)); const ptrdiff_t __len = __s + (std::max)(__s, __n); if (__len <= 0) __builtin_unreachable(); ptrdiff_t __ret = (std::min)(__len, __ms); return (_Tp*)__ret-(_Tp*)0; // hack to generate divexact, so it simplifies with * sizeof(_Tp) } generates nicer code. But after those experiments, it seems clear that the performance of this code is irrelevant (not surprising since it is followed by a call to operator new), and its effect on global performance is random. Possibly it causes something to get aligned differently, which can randomly get this 25% speed-up, but can just as randomly go back to the slow version. Anyway, I don't think I'll be submitting any patch for this.
[Bug libstdc++/90436] Redundant size checking in vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90436 --- Comment #4 from Marc Glisse --- (side note not related to the redundant size checking) It is surprising how, in the code from comment 2, adding v.reserve(1000) does not help, it even slows the program down slightly here (yes, that's rather hard to believe). To reap the benefits, I also need to add in the loop: if(v.size()==v.capacity())__builtin_abort(); which enables the compiler to remove the reallocation code, and once that code is removed it can actually prove that size never reaches capacity and remove the call to abort! We don't even need __builtin_unreachable there. And once all that dead code is removed, it can finally vectorize.
[Bug tree-optimization/95801] Optimiser does not exploit the fact that an integer divisor cannot be zero
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95801 --- Comment #1 from Marc Glisse --- Except when dereferencing a pointer (?), gcc seldom uses an operation to derive properties on the operands, it mostly derives properties on the result. That's in large part because the information you are getting on the operands is only valid in some regions, not for the whole life of the SSA_NAME (in if(y!=0)x/y; the division obviously doesn't allow to remove the earlier test for y!=0) There could be many cases: x/y => y is not 0 i+1 => i is not INT_MAX x/[ex]4 => the last 2 bits of x are 0 ptr+n or *ptr => ptr is not a null pointer There is code in isolate-path to handle operands that are potentially 0, but I think that's only when we see x / PHI, not for a "normal" divisor. VRP works around the issue by creating extra SSA_NAMEs for the regions where we know more about a variable, but it only does it for branches like if(x<10), doing it for the operands of every operation would be too costly.
[Bug c/95818] wrong "used uninitialized" warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95818 --- Comment #3 from Marc Glisse --- Richard said "complete", that is the whole .i file, not just one random function. If we cannot reproduce the issue by copying your code and compiling it, we can't do anything about your report.
[Bug tree-optimization/95906] Failure to recognize max pattern with mask
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95906 --- Comment #1 from Marc Glisse --- I'd say generate a (vec_)cond_expr, not directly a max. That is, replace the comparison with any truth_valued_p (hmm, that function probably stopped working for vectors when all comparisons were wrapped in vec_cond for avx512).
[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663 --- Comment #12 from Marc Glisse --- (In reply to Jeffrey A. Law from comment #10) > __builtin_trap emits an actual trap into the instruction stream which halts > the process immediately which is *much* better from a security standpoint Regardless of what the default is, I think we should be able to agree that there are uses where we want to favor hardening/security (public facing servers, web browsers), and others where performance is more important (scientific simulations), and it would be nice to give users a choice. (I think sanitizers already provide a way to turn __builtin_unreachable into __builtin_trap, but that's more meant for explicit __builtin_unreachable in user code)
[Bug tree-optimization/95926] Failure to optimize xor pattern when using temporary variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95926 --- Comment #1 from Marc Glisse --- It is different to gcc because in the first case, tmp is used twice, while in the second case, each a&b is only used once, and gcc only transforms (a&b)^b to b&~a if this is the only use of a&b. Yes, this heuristic often backfires, but as long as we consider &~ as 2 operations, not restricting the transformation could generate worse code in some cases.
[Bug tree-optimization/95924] Failure to optimize some bit magic to one of the operands
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95924 --- Comment #1 from Marc Glisse --- * If I replace ~a with !a, we manage to do everything with type bool. With ~a, we don't, we stick to int. * We don't handle a?b:false the same as a&&b. * Even for (a | !b) && (!(!a & b) && a) we don't completely simplify, because that would be replacing too many && with & (I think). If I manually replace one && with &, gcc manages.
[Bug tree-optimization/95929] Failure to optimize tautological comparisons of comparisons to a single one
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95929 --- Comment #1 from Marc Glisse --- Here gcc does optimize the first f to (a != 0) ^ (b != 0). However, for the second f, it does indeed generate something that looks like the first f before optimization... The optimization for the first f is probably "(X && !Y) || (!X && Y) is X ^ Y" in fold-const.c, which may not have an equivalent in match.pd yet.
[Bug tree-optimization/95923] Failure to optimize bool checks into and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95923 --- Comment #1 from Marc Glisse --- (With one variant I ended up with (a|b)&(a==b), which we don't optimize to a&b) We don't optimize !(!a && !b) && !(!a && b) && !(a && !b) (we keep several branches), but we do optimize if I manually replace enough && with &.
[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 --- Comment #11 from Marc Glisse --- while(!a.isZero()); that doesn't look like something you would find in real code. Are you waiting for a different thread to modify a? Then you should use an atomic operation. Are you waiting for the hardware to change something? Use volatile. Do you really want an infinite loop? Spell it out if(!a.isZero())for(;;);
[Bug tree-optimization/96009] missed optimization with floating point operations and integer literals
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96009 --- Comment #2 from Marc Glisse --- Note that we don't do the optimization if you replace double with long either.
[Bug c++/96065] Move elision of returned automatic variable doesn't happen the variable is enclosed in a block
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96065 Marc Glisse changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Marc Glisse --- dup *** This bug has been marked as a duplicate of bug 51571 ***
[Bug c++/51571] No named return value optimization while adding a dummy scope
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51571 Marc Glisse changed: What|Removed |Added CC||b7.10110111 at gmail dot com --- Comment #6 from Marc Glisse --- *** Bug 96065 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/96108] Different behavior in DSE pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96108 --- Comment #4 from Marc Glisse --- During optimization, we often have branches with dead code that would exhibit UB if it was ever executed. Cleaning up those branches as much as possible helps reduce code size, show that some variables (in the live part of the code) are constant, etc. If we see *p=x where p is uninitialized, it doesn't serve much purpose to keep it as is, we might as well replace it with *0=0 or trap/unreachable depending on options.
[Bug c++/96121] Uninitialized variable copying not diagnosed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121 --- Comment #2 from Marc Glisse --- gcc warns for this at the level of actual instructions, not user code. Since A is empty, nothing uninitialized is getting copied.
[Bug c++/96121] Uninitialized variable copying not diagnosed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121 --- Comment #3 from Marc Glisse --- And this translation unit doesn't actually generate any code at all, so the way the warning is currently implemented has no chance of even looking at it.
[Bug c++/96121] Uninitialized variable copying not diagnosed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121 --- Comment #5 from Marc Glisse --- Yes, then we are back to the fact that it works for A=int but not for A a class containing an int.
[Bug libstdc++/96088] Range insertion into unordered_map is less effective than a loop with insertion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96088 --- Comment #3 from Marc Glisse --- (In reply to Jonathan Wakely from comment #2) > Or use unordered_map, equal_to<>> which > should perform better. Good idea. > We haven't implemented http://wg21.link/p0919r3 and http://wg21.link/p1690r1 > yet, I wonder if those would help, especially if we make the internal > helpers available pre-C++20. That could allow the range insertion to use the > heteregenous lookup, to avoid creating temporaries. I'm not sure if that > would be conforming though. Heterogeneous lookup is observably different, > and not conforming in C++17. Restricting it to a few standard types like string should not be observable. > Adding hash::operator()(string_view) is an interesting idea for the > standard though. Indeed. If we want to, I think it is possible to add some overloads for when the argument is exactly const char* or string_view, which should remain conforming and provide a significant part of the benefits.
[Bug tree-optimization/96369] Wrong evaluation order of || operator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96369 Marc Glisse changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-07-29 Keywords||wrong-code Component|c |tree-optimization Status|UNCONFIRMED |NEW
[Bug target/96327] Inefficient increment through pointer to volatile on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96327 --- Comment #5 from Marc Glisse --- I don't think bug 3506 has been fixed (its status seems wrong to me). But don't worry, there are several other duplicates that still have status NEW (bug 50677 for instance). This is a sensible enhancement request, I think some gcc backends already do a similar optimization, it simply isn't a priority, because volatile almost means "don't optimize this". At least the difference between the gcc and clang codes matches those other PRs. Not sure why you are talking of address computations.
[Bug tree-optimization/96392] Optimize x+0.0 if x is an integer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96392 Marc Glisse changed: What|Removed |Added Last reconfirmed||2020-07-30 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Severity|normal |enhancement
[Bug tree-optimization/95433] Failure to completely optimize simple compare after operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95433 --- Comment #5 from Marc Glisse --- Patch posted at https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551154.html for the original testcase. Note that solving univariate polynomial equations *in the integers* (the rationals are not much harder) is actually rather simple, just enumerate the divisors of the constant term and evaluate the polynomial on each of them to check which ones are roots. If someone wants to implement that...
[Bug middle-end/96426] New: __builtin_convertvector ICE without lhs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96426 Bug ID: 96426 Summary: __builtin_convertvector ICE without lhs Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- typedef long long veci __attribute__((vector_size(16))); typedef double vecf __attribute__((vector_size(16))); void f(veci v){ __builtin_convertvector(v,vecf); } $ gcc u.c during GIMPLE pass: veclower u.c: In function 'f': u.c:3:6: internal compiler error: Segmentation fault 3 | void f(veci v){ | ^ Found by inspection, I wanted to introduce another conversion-like IFN (not pure) and didn't know where to store the result type in case the lhs is missing. Here the easiest would be to skip the stmt if there is no lhs, even if that's not perfect for -O0, but I am also interested in how to handle the non-pure case, if you have ideas...
[Bug tree-optimization/96433] Failed to optimize (A / N) * N <= A
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96433 Marc Glisse changed: What|Removed |Added Component|c |tree-optimization Last reconfirmed||2020-08-03 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Version|tree-ssa|11.0 --- Comment #2 from Marc Glisse --- Replacing (A/B)*B with A-A%B makes the transformation a bit simpler (and the signed variants with ceil/floor), but we probably don't want to do that transformation all the time (?). Which leaves the rather specialized (A/B)*B cmp A...
[Bug target/70314] AVX512 not using kandw to combine comparison results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314 --- Comment #4 from Marc Glisse --- We now generate for the original testcase vpcmpd $1, %zmm3, %zmm2, %k1 vpcmpd $1, %zmm1, %zmm0, %k0{%k1} vpmovm2d%k0, %zmm0 which looks great. However, using | instead of &, we get vpcmpd $1, %zmm1, %zmm0, %k0 vpcmpd $1, %zmm3, %zmm2, %k1 kmovw %k0, %eax kmovw %k1, %edx orl %edx, %eax kmovw %eax, %k2 vpmovm2d%k2, %zmm0 Well, at least gimple did what it could, and it is now up to the target to handle logical operations on bool vectors / k* registers. There is probably already another bug report about that...
[Bug tree-optimization/95906] Failure to recognize max pattern with mask
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95906 --- Comment #3 from Marc Glisse --- With the patch (which only affects vectors), f becomes (a>b)?a:b. It should be easy to add the corresponding transform to MAX_EXPR in match.pd.
[Bug middle-end/88670] [meta-bug] generic vector extension issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88670 Bug 88670 depends on bug 70314, which changed state. Bug 70314 Summary: AVX512 not using kandw to combine comparison results https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug target/70314] AVX512 not using kandw to combine comparison results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314 Marc Glisse changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Marc Glisse --- Ok, there are enough duplicates for that part, this particular bug report was mostly about the gimple part, which is fixed now. Closing.
[Bug tree-optimization/96513] building terminated with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96513 --- Comment #1 from Marc Glisse --- What you could do, even if it is private code, is reduce it (https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction) until it is very small and doesn't give away any IP, and then post it. Otherwise, there is not much we can do from your report and we are probably just going to close it...
[Bug target/96528] New: [11 Regression] vector comparisons on ARM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96528 Bug ID: 96528 Summary: [11 Regression] vector comparisons on ARM Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- Target: arm-none-linux-gnueabihf (see the discussion after https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551468.html ) I am using a compiler configured with --target=arm-none-linux-gnueabihf --with-float=hard --with-cpu=cortex-a9 --with-fpu=neon-fp16 typedef unsigned int vec __attribute__((vector_size(16))); typedef int vi __attribute__((vector_size(16))); vi f(vec a,vec b){ return a==5 | b==7; } Compiling with -O yields very long scalar code. Adding -fno-tree-forwprop gets back the nice, vector code. (at higher optimization levels, one may also need to disable vrp) This is due to the fact that while the ARM target handles VEC_COND_EXPR just fine, it does not handle a plain v == w that is not fed directly to a VEC_COND_EXPR. I was surprised to notice that "grep vec_cmp" gives a number of lines in the aarch64/ directory, but none in arm/, while AFAIK those neon instructions are the same. Would it be possible to implement this on ARM as well? Other middle-end options are also possible, but the difference with aarch64 makes it tempting to handle it in the target.
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 --- Comment #2 from Marc Glisse --- (In reply to Jakub Jelinek from comment #1) > In bar, this is optimized, because fold_binary_op_with_conditional_arg > optimizes > 255 >> (x ? 1 : 0) into x ? 127 : 255 and when multiplied by two in unsigned > char this results in x ? 254 : 254. > We don't have anything comparable in match.pd yet I believe (and should we?). We have something for VEC_COND_EXPR to fold a op (b?c:d), but not for COND_EXPR, which you would be unlikely to see in gimple (and the generator of phiopt transforms from match.pd patterns hasn't appeared yet). Also, we only have x!=0, and while fold_binary_op_with_conditional_arg tries to handle it like x!=0?1:0, we indeed don't do anything like that for gimple. And it seems possibly better suited to forward propagation than backward like match.pd. > Or shall say VRP try harder if it sees [0, 1] ranges? If a range has only 2 (or some other small number) values, try propagating each and see if some variables end up with the same value in both cases? Or if enough simplifications occur that it is worth introducing a conditional? I am not sure it would be worth the trouble. > Though, shouldn't we optimize e.g. > unsigned > baz (unsigned int x) > { > if (x >= 4) return 32; > return (-1U >> x) * 16; > } > too to return x >= 4 ? 32U : -16U; ? > Not sure where and how to generalize it though. > Value range of -1U >> [0, 3] is not really useful here, nonzero bits either. > And having a specialized (const1 >> x) * const2 optimizer based on x's value > range would work, but not sure if it has a real-world benefit. And here this is complicated by the fact that we do not narrow the operation, so it is less obvious that the constant is -1.
[Bug c/96550] gcc is smart in figuring out a non-returning function.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96550 --- Comment #1 from Marc Glisse --- Does -fno-delete-null-pointer-checks help?
[Bug target/50829] avx extra copy for _mm256_insertf128_pd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829 Marc Glisse changed: What|Removed |Added Status|NEW |RESOLVED Known to work||10.1.0 Resolution|--- |FIXED Known to fail||9.3.0 --- Comment #14 from Marc Glisse --- This was fixed (by Jakub I think).
[Bug rtl-optimization/48037] Missed optimization: unnecessary register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48037 --- Comment #10 from Marc Glisse --- We now generate just sqrtpd %xmm0, %xmm0 for 2 and 4, sqrtpd (%rdi), %xmm0 for 3, and movupd (%rdi), %xmm0 sqrtpd %xmm0, %xmm0 for 1 (for alignment reasons I guess, the movu disappears with -mavx). Should we close it as fixed? Or update the testcase to perform a different operation that isn't recognized?
[Bug tree-optimization/96563] Failure to optimize loop with condition to simple arithmetic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96563 Marc Glisse changed: What|Removed |Added Last reconfirmed||2020-08-11 Ever confirmed|0 |1 Severity|normal |enhancement Status|UNCONFIRMED |NEW Keywords||missed-optimization --- Comment #1 from Marc Glisse --- At -O2 (before uncprop) [local count: 151290756]: [local count: 976138698]: # i_9 = PHI <0(2), i_4(6)> if (x_3(D) == i_9) goto ; [5.50%] else goto ; [94.50%] [local count: 922451069]: i_4 = i_9 + 1; if (i_4 != 10) goto ; [89.42%] else goto ; [10.58%] [local count: 97603126]: goto ; [100.00%] [local count: 824847943]: goto ; [100.00%] [local count: 53687628]: [local count: 151290756]: # _2 = PHI <8(7), 4(8)> return _2; We don't really do anything special. At -O3, the loop gets unrolled [local count: 151290757]: if (x_3(D) == 0) goto ; [5.50%] else goto ; [94.50%] [local count: 8320992]: goto ; [100.00%] [local count: 142969766]: if (x_3(D) == 1) goto ; [5.50%] else goto ; [94.50%] [local count: 7863337]: goto ; [100.00%] [local count: 135106428]: if (x_3(D) == 2) goto ; [5.50%] else goto ; [94.50%] [local count: 7430853]: goto ; [100.00%] [local count: 127675576]: if (x_3(D) == 3) goto ; [5.50%] else goto ; [94.50%] [local count: 7022157]: goto ; [100.00%] [local count: 120653419]: if (x_3(D) == 4) goto ; [5.50%] else goto ; [94.50%] [local count: 6635938]: goto ; [100.00%] [local count: 114017483]: if (x_3(D) == 5) goto ; [5.50%] else goto ; [94.50%] [local count: 6270962]: goto ; [100.00%] [local count: 107746521]: if (x_3(D) == 6) goto ; [5.50%] else goto ; [94.50%] [local count: 5926059]: goto ; [100.00%] [local count: 101820460]: if (x_3(D) == 7) goto ; [5.50%] else goto ; [94.50%] [local count: 5600125]: goto ; [100.00%] [local count: 96220334]: if (x_3(D) == 8) goto ; [5.50%] else goto ; [94.50%] [local count: 5292118]: goto ; [100.00%] [local count: 90928219]: if (x_3(D) == 9) goto ; [5.50%] else goto ; [94.50%] [local count: 5001052]: goto ; [100.00%] [local count: 97603126]: [local count: 151290756]: # _2 = PHI <8(14), 4(12), 8(22), 8(21), 8(20), 8(19), 8(18), 8(17), 8(16), 8(15), 8(23)> return _2; We have code in reassoc to handle x==0||x==1||x==2 and turn it into a range test. I suspect the issue is related to those empty bb between the condition and the PHI that hides the fact that they are jumping to the same place eventually. Fixing this for the unrolled case is thus probably easiest, although of course it would be nice if it also worked for 99 instead of 9, where we are not going to unroll.