[Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+

2020-03-09 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092

--- Comment #3 from Marc Glisse  ---
Does profile feedback (so we have an idea on the loop count) make any
difference? It seems clear that for a loop that in practice just copies one
long, having to arrange the arguments, make a function call, test for
alignment, etc, is a lot of overhead. What to do about it though...

[Bug c++/94141] New: c++20 rewritten operator== recursive call mixing friend and external operators for template class

2020-03-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94141

Bug ID: 94141
   Summary: c++20 rewritten operator== recursive call mixing
friend and external operators for template class
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

(reduced from a user of boost/operators.hpp)

template  class A;
template  bool operator==(const A&, int) { return false; }

template  class A {
  friend bool operator==(int y, const A& x) { return x == y; }
};

int main(){
  A q;
  q==3;
  3==q;
}

$ g++ -std=c++2a a.c -Wall && ./a.out
a.c: In instantiation of 'bool operator==(int, const A&)':
a.c:10:6:   required from here
a.c:5:56: warning: in C++20 this comparison calls the current function
recursively with reversed arguments
5 |   friend bool operator==(int y, const A& x) { return x == y; }
  |  ~~^~~~
zsh: segmentation fault  ./a.out

If I make both operators friends, or move both outside, gcc is happy, but in
this mixed case, it doesn't seem to want to use the first operator== and
prefers the rewritten second operator==. Of course removing the second
operator== completely also works.
Clang is fine with this version of the code.

I have trouble parsing the standard wording, but IIRC one of the principles
when adding <=> was that explicitly written functions should have priority over
new, invented ones.

Bug 93807 is the closest I could find.

[Bug tree-optimization/61338] too many permutation in a vectorized "reverse loop"

2020-03-16 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61338

--- Comment #3 from Marc Glisse  ---
Possibly easier is the case of a reduction, where permutations are clearly
irrelevant.

int f(int*arr,int size){
  int sum=0;
  for(int i = 0; i < size; i++){
sum += arr[size-1-i];
  }
  return sum;
}

We still have a VEC_PERM_EXPR in the hot loop before accumulating.

(by the way, we accumulate in a variable of type "vector(4) int", while I would
expect "vector(4) unsigned int" for overflow reasons)

[Bug target/94194] x86: Provide feraiseexcept builtins

2020-03-16 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94194

--- Comment #1 from Marc Glisse  ---
Is there a convenient way for gcc to know the value of FE_DIVBYZERO, etc on the
target? Do we need to hardcode it? Can we rely on different libc on the same
processor to use the same value?
What happens if the user calls feraiseexcept and compiles with
-fno-trapping-math?

[Bug tree-optimization/94234] missed ccp folding for (addr + 8 * n) - (addr + 8 * (n - 1))

2020-03-20 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94234

--- Comment #2 from Marc Glisse  ---
The closest we have is

/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).

which does not handle conversions, although it should be possible to add them.

[Bug tree-optimization/94274] fold phi whose incoming args are defined from binary operations

2020-03-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94274

--- Comment #1 from Marc Glisse  ---
Detecting common beginnings / endings in branches is something gcc does very
seldom. Even at -Os, for if(cond)f(b);else f(c); we need to wait until
rtl-optimizations to get a single call to f. (of course the reverse
transformation of duplicating a statement that was after the branches into
them, if it simplifies, is nice as well, and they can conflict)
I don't know if handling one such very specific case (binary operations with a
common argument) separately is a good idea when we don't even handle unary
operations.

[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed

2020-03-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293

--- Comment #1 from Marc Glisse  ---
Adding

inline void* operator new(std::size_t n){return __builtin_malloc(n);}
inline void operator delete(void*p)noexcept{__builtin_free(p);}
inline void operator delete(void*p,std::size_t)noexcept{__builtin_free(p);}

lets gcc optimize. Without it, we end up with

  _37 = operator new (51);
  __builtin_memcpy (_37, "Hey... no small-string optimization for me please!",
50);
  MEM[(char_type &)_37 + 50] = 0;
  operator delete (_37, 51);
  return 123;

I expect DSE (via tree-ssa-alias.c) doesn't know about delete the way it knows
about free and thus doesn't see the stores as dead.

[Bug tree-optimization/94294] [missed optimization] new+delete of unused local string not removed

2020-03-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94294

--- Comment #2 from Marc Glisse  ---
(In reply to Eyal Rozenberg from comment #0)
> Note: I suppose it's theoretically possible that this bug only manifests
> because   bug 94293 prevents the allocated space from being recognized as
> unused; but I can't tell whether that's the case.

Pretty sure that's the case, gcc removes new/delete pairs now when nothing
writes to that memory.

[Bug tree-optimization/94293] [missed optimization] Useless statements populating local string not removed

2020-03-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94293

--- Comment #4 from Marc Glisse  ---
Or just

void f(){
  int*p=new int[1];
  *p=42;
  delete[] p;
}

while it does optimize for

void f(){
  int*p=new int;
  *p=42;
  delete p;
}

because the front-end gives us a clobber before operator delete.

[Bug libstdc++/94295] use __builtin_operator_new and __builtin_operator_delete when available

2020-03-24 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94295

--- Comment #1 from Marc Glisse  ---
(In reply to Richard Smith from comment #0)
> The C++ language rules do not permit optimization (eg, deletion) of direct
> calls to 'operator new' and 'operator delete'.

I thought that was considered a bug?
Gcc does optimize those, like it does malloc/free...

> This bug requests that libstdc++ uses these builtins when available.

So just in std::allocator, or are there other places?

> (Separately, it'd be great if GCC considered supporting them too.)

IIRC (would need to dig up the conversation), when the optimization for
new/delete pairs was added in gcc, the builtin option was rejected.

[Bug tree-optimization/94294] [missed optimization] new+delete of unused local string not removed

2020-03-24 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94294

--- Comment #4 from Marc Glisse  ---
I don't believe there is a "new/delete" issue.

[Bug libstdc++/94295] use __builtin_operator_new and __builtin_operator_delete when available

2020-03-24 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94295

--- Comment #5 from Marc Glisse  ---
(In reply to Richard Smith from comment #2)
> (In reply to Marc Glisse from comment #1)
> > (In reply to Richard Smith from comment #0)
> > > The C++ language rules do not permit optimization (eg, deletion) of direct
> > > calls to 'operator new' and 'operator delete'.
> > 
> > I thought that was considered a bug?
> 
> No, it's intentional: if the user directly calls '::operator new(42)' and
> they've replaced that function, the replacement function is guaranteed to be
> called. In this regard, 'operator new' is just a regular function with a
> funny name.
> 
> To be clear, the implicit call to 'operator new' produced by, say, 'new int'
> *is* optimizable, but a direct explicit call to 'operator new(sizeof(int))'
> is not.

Ah, since you are here, and you appeared as an author of N3664 but not N3537
(precisely when this subtlety happened), could you explain why? It isn't
discussed in the paper, complicates the design, and I cannot think of any use
for this distinction (there are workarounds if people don't want their explicit
call elided).

This of course doesn't at all prevent from adding a __builtin_operator_new
option in std::allocator, it only affects how motivated we should be to fix the
non-conformance.

[Bug c++/94314] New: [10 Regression] Optimizing mismatched new/delete pairs

2020-03-24 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94314

Bug ID: 94314
   Summary: [10 Regression] Optimizing mismatched new/delete pairs
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

(originally posted at
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00276.html , I don't know
if we will do something about it, but it seems worth documenting it in
bugzilla)

Now that we optimize class-specific operator new/delete pairs (but you could do
the same with the global replacable ones as well):

#include 
int count = 0;
struct A {
  __attribute__((malloc,noinline))
  static void* operator new(unsigned long sz){++count;return ::operator
new(sz);}
  static void operator delete(void* ptr){--count;::operator delete(ptr);}
};
int main(){
  delete new A;
  printf("%d\n",count); // Should print 0.
}

If we do not inline anything, we can remove the pair and nothing touches count.
If we inline both new and delete, we can then remove the inner pair instead,
count increases and decreases, fine. If we inline only one of them, and DCE the
mismatched pair new/delete, we get something inconsistent (count is -1).

This seems to indicate we should check that the new and delete match somehow...

[Bug c++/94314] [10 Regression] Optimizing mismatched new/delete pairs since r10-2106-g6343b6bf3bb83c87

2020-03-25 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94314

--- Comment #5 from Marc Glisse  ---
I don't think we need heavy machinery linking new and delete (and if we did I'd
be tempted to store it in some global table rather than in the nodes). The most
important case is the global replacable functions, for which we have a finite
list, and for those a few checks like not matching array with non-array
versions should do. For user overloads with attribute malloc (a gcc extension),
I would go with heuristics like both/neither being class members, being members
of the same class, etc. Although I am not quite sure how doable that is from
the middle-end, how much of that information is still available (I think it is
available in the mangled name, but demangling doesn't seem like a great idea).

[Bug libstdc++/94295] use __builtin_operator_new and __builtin_operator_delete when available

2020-03-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94295

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
   Last reconfirmed||2020-03-26
   Severity|normal  |enhancement
 Ever confirmed|0   |1

[Bug tree-optimization/94356] Missed optimisation: useless multiplication generated for pointer comparison

2020-03-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94356

--- Comment #1 from Marc Glisse  ---
That's because internally we use an unsigned type for offsets (including for
the multiplication). There have been tries to change that...

[Bug tree-optimization/94356] Missed optimisation: useless multiplication generated for pointer comparison

2020-03-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94356

--- Comment #3 from Marc Glisse  ---
I tried https://gcc.gnu.org/pipermail/gcc-patches/2017-May/475037.html some
time ago. Having a different type for the multiplication and the offsetting
introduced a lot of NOPs and caused a few regressions (from my notes,
pta-ptrarith-3.c and ssa-pre-8.c were just testsuite issues, but there were
others). I filed a bug or 2 about those (PR 88926 at least). Then I tried to
also make the offsets signed, to be consistent, but that's a much bigger piece
and I ran out of time and motivation.

[Bug libstdc++/63706] stl_heap.h:make_heap()'s worst time complexity doesn't conform with C++ standard

2020-03-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63706

Marc Glisse  changed:

   What|Removed |Added

   Last reconfirmed||2020-03-29
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
Trying exactly the construction described here with g++-4.9:

#include 
#include 
#include 

int count=0;
bool cmp(int a, int b){++count;return a v;
  const int n=1;
  for(int i=0;i

[Bug libstdc++/51965] Redundant move constructions in heap algorithms

2020-03-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51965

--- Comment #19 from Marc Glisse  ---
(In reply to Jonathan Wakely from comment #16)
> (In reply to Marc Glisse from comment #5)
> > (The split into push_heap and __push_heap is just so the first part can be
> > inlined without the second, right?)
> > 
> > A more direct adaptation of the old code to rvalue references would be:
> > 
> >   std::__push_heap(__first, _DistanceType((__last - __first) - 1),
> >_DistanceType(0), _ValueType(_GLIBCXX_MOVE(*(__last -
> > 1;
> 
> I tried doing this and it didn't seem to help the testcase attached here.

push_heap(): default_ctors=0, copy_ctors=0, copy_assignments=0, swaps=0,
[-cheap_dtors=1998,-] {+cheap_dtors=999,+} expensive_dtors=0,
[-move_ctors=1998,-] {+move_ctors=999,+} cheap_move_assignments=2201,
expensive_move_assignments=0, comparisons=2196

It doesn't help the other operations, but it has some effect on this one.

[Bug middle-end/94412] wrong code with -fsanitize=undefined and vectors

2020-03-30 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94412

Marc Glisse  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-03-30
 Status|UNCONFIRMED |NEW

--- Comment #1 from Marc Glisse  ---
Actually it seems to me that the code is only right with -fsanitize=undefined,
it has to abort.
We replace -v/11u by v/-11u because in fold-const.c we check:
  if ((!INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_UNDEFINED (type))
instead of ANY_INTEGRAL_TYPE_P for instance.

[Bug c++/94141] c++20 rewritten operator== recursive call mixing friend and external operators for template class

2020-04-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94141

--- Comment #1 from Marc Glisse  ---
It looks like clang-10+ also generates an infinite loop on this code. Does the
standard really give priority to some implicit function over a user-defined one
that is an exact match?

[Bug c++/94141] c++20 rewritten operator== recursive call mixing friend and external operators for template class

2020-04-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94141

--- Comment #2 from Marc Glisse  ---
Ah, maybe the friend function is not quite a template, so the generated swapped
function is not a template either, and thus it has priority over a template if
both are exact matches?

This is going to break a number of users of boost/operators.hpp, and possibly
other mixins using a similar technique.

[Bug middle-end/62080] Suboptimal code generation with eigen library

2020-04-06 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62080

Marc Glisse  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-04-06

--- Comment #8 from Marc Glisse  ---
Even with gcc-4.8.4 (the oldest I have), I cannot reproduce the original
report. Maybe Eigen changed since then. That's why we ask for self-contained
testcases (possibly just the preprocessed source code).

[Bug c++/94314] [10 Regression] Optimizing mismatched new/delete pairs since r10-2106-g6343b6bf3bb83c87

2020-04-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94314

--- Comment #10 from Marc Glisse  ---
I am still getting -1 at -O2 for

#include 
#include 
int count = 0;
__attribute__((malloc,noinline))
void* operator new[](unsigned long sz){++count;return ::operator new(sz);}
void operator delete[](void* ptr)noexcept{--count;::operator delete(ptr);}
void operator delete[](void* ptr, std::size_t sz)noexcept{--count;::operator
delete(ptr, sz);}
int main(){
  delete[] new int[1];
  printf("%d\n",count); // Should print 0.
}

I am not aware of any code that breaks in practice, but it still looks strange.

[Bug tree-optimization/94566] New: conversion between std::strong_ordering and int

2020-04-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94566

Bug ID: 94566
   Summary: conversion between std::strong_ordering and int
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

#include 

int conv1(std::strong_ordering s){
  if(s==std::strong_ordering::less) return -1;
  if(s==std::strong_ordering::equal) return 0;
  if(s==std::strong_ordering::greater) return 1;
  __builtin_unreachable();
}
std::strong_ordering conv2(int i){
  switch(i){
case -1: return std::strong_ordering::less;
case 0: return std::strong_ordering::equal;
case 1: return std::strong_ordering::greater;
default: __builtin_unreachable();
  }
}

Compiling with -std=gnu++2a -O3. I would like the compiler to notice that those
are just NOP (at most a sign-extension). Clang manages it for conv2. Gcc
generates:

movl$-1, %eax
cmpb$-1, %dil
je  .L1
xorl%eax, %eax
testb   %dil, %dil
setne   %al
.L1:
ret

and

xorl%eax, %eax
testl   %edi, %edi
je  .L10
cmpl$1, %edi
sete%al
leal-1(%rax,%rax), %eax
.L10:
ret


(apparently the C++ committee thinks it is a good idea to provide a type that
is essentially an int that can only be -1, 0 or 1, but not provide any direct
way to convert to/from int)

[Bug tree-optimization/94589] New: Optimize (i<=>0)>0 to i>0

2020-04-13 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

Bug ID: 94589
   Summary: Optimize (i<=>0)>0 to i>0
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

g++-10 -std=gu++2a -O3

#include 
bool k(int i){
  auto c=i<=>0;
  return c>0;
}

   [local count: 1073741824]:
  if (i_1(D) != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870913]:
  _2 = i_1(D) >= 0;

   [local count: 1073741824]:
  # prephitmp_6 = PHI <_2(3), 0(2)>
  return prephitmp_6;

For most comparisons @ we do optimize (i<=>0)@0 to just i@0, but not for > and
<=. Spaceship operator<=> is very painful to use, but I expect we will end up
seeing a lot of it with C++20, and comparing its result with 0 is almost the
only way to use its output, so it seems important to optimize this common case.

(there is probably a very old dup, but I couldn't find it)

[Bug tree-optimization/94566] conversion between std::strong_ordering and int

2020-04-15 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94566

--- Comment #3 from Marc Glisse  ---
I thought we had code to recognize a switch that represents a linear function,
I was hoping that it would kick in with your hoisting patch...

[Bug rtl-optimization/94798] Failure to optimize subtraction and 0 literal properly

2020-04-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94798

--- Comment #1 from Marc Glisse  ---
(In reply to Gabriel Ravier from comment #0)
> Comparison here : https://godbolt.org/z/LZ8dBy

In your future bug reports, could you please copy all relevant information
instead of (or in addition to) linking to some external website just to show 3
lines of asm? Thanks.

[Bug tree-optimization/94801] Failure to optimize narrowed __builtin_clz

2020-04-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94801

--- Comment #1 from Marc Glisse  ---
Gcc considers that clz might return 32 on some platforms, it does not currently
use target-specific information to restrict the range of clz output.

[Bug tree-optimization/94801] Failure to optimize narrowed __builtin_clz

2020-04-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94801

--- Comment #2 from Marc Glisse  ---
  if(a==0)__builtin_unreachable();

lets gcc optimize the code.

[Bug libstdc++/94811] Please make make_tuple noexcept when possible

2020-04-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94811

Marc Glisse  changed:

   What|Removed |Added

  Component|c++ |libstdc++
   Severity|normal  |enhancement

--- Comment #1 from Marc Glisse  ---
Sure, it is possible, but isn't std::make_tuple mostly legacy at this point,
with CTAD you can just use std::tuple, which is noexcept already.

Each extra noexcept is one more chance to get things wrong, at least as long as
wg21 refuses noexcept(auto), although this particular case doesn't seem
particularly hard.

[Bug rtl-optimization/94804] Failure to elide useless movs in 128-bit addition

2020-04-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804

Marc Glisse  changed:

   What|Removed |Added

   Keywords||ra

--- Comment #2 from Marc Glisse  ---
Gcc's register allocation is not well optimized for hard registers at function
boundaries (inlining makes this case not very important), there are several
related bug reports. It would be nice to improve that, but it is likely to get
lower priority than if you can find a similar issue in the middle of a hot
loop.

[Bug rtl-optimization/94804] Failure to elide useless movs in 128-bit addition

2020-04-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804

--- Comment #4 from Marc Glisse  ---
(In reply to Gabriel Ravier from comment #3)
> Having similar problems with useless movs is from the same non
> well-optimized register allocation on function boundaries ?

I don't know, but possibly not. I'll shut up because I am not a RA
specialist...
(and if you expect to see it optimized to bswap64, then obviously it is
unrelated to register allocation)

[Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns

2020-05-01 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #1 from Marc Glisse  ---
Even if we write __builtin_shuffle, the vector lowering pass turns it into the
same code (constructor of BIT_FIELD_REFs), which seems to indicate that the
target does not handle this pattern.

[Bug tree-optimization/94911] Failure to optimize comparisons of VLA sizes

2020-05-01 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94911

--- Comment #1 from Marc Glisse  ---
gcc computes sizeof(a) as 4ul*(size_t)n, and unsigned types don't provide nice
overflow guarantees, so that complicates things.

[Bug tree-optimization/94911] Failure to optimize comparisons of VLA sizes

2020-05-01 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94911

--- Comment #3 from Marc Glisse  ---
Since VLA is an extension for compatibility with C, it is strange that it
behaves differently (does one use the value of n at the time of the typedef and
the other at the time of the declaration?). This bug is about the optimization,
maybe a separate report about the C++ behavior would make sense.

[Bug c++/94905] Bogus warning -Werror=maybe-uninitialized

2020-05-01 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94905

--- Comment #1 from Marc Glisse  ---
Several of us asked, and it was rejected. Your next step is to provide a
self-contained testcase (preprocessed sources?). You may also want to check if
it still warns in gcc-10.

[Bug tree-optimization/94919] Failure to recognize max pattern

2020-05-02 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94919

--- Comment #1 from Marc Glisse  ---
This seems related to another one you reported, in the category: i&-b == b?i:0
(for b∈{0,1}). The first form has the advantage of no branch, while the second
is less obfuscated and simplifies more naturally (like when we combine x<0||y<0
to (x|y)<0 we get something shorter, but more obfuscated which can hinder other
optimizations). Here this equivalence gives the intermediate step of
(x>=y?x^y:0)^y.

I didn't see if you posted it somewhere, but I assume those tests come from the
llvm testsuite? Did people really hit such weird code in the wild and add them
one by one to llvm? Or was there some automated process involved ? I could
imagine looking at all the expressions of a certain size using some list of
operators and only 2 variables (and maybe a few constants), compute the table
of values for a small type (a 3-bit integer?), put them in a hash table, and
study collisions? Generating completely automatically the list of
transformations may be a bit hard though. And handling undefined cases (like
signed overflow) complicates things, since we are not looking for exact matches
there.

[Bug tree-optimization/94921] Failure to optimize nots with sub into single add

2020-05-02 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94921

--- Comment #1 from Marc Glisse  ---
x + y ?

[Bug tree-optimization/94930] Failure to optimize out subvsi in expansion of __builtin_memcmp with 1 as the operand with -ftrapv

2020-05-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94930

--- Comment #1 from Marc Glisse  ---
AFAIK -ftrapv doesn't work very well and is kind of abandoned, in favor of
-fsanitize=signed-integer-overflow (possibly with
-fsanitize-undefined-trap-on-error), which does generate the code you expect.

[Bug tree-optimization/94914] Failure to optimize check of high part of 64-bit result of 32 by 32 multiplication into overflow check

2020-05-04 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94914

--- Comment #4 from Marc Glisse  ---
I thought we might already simplify (u >> 32) != 0 to u >= cst (other possible
forms are u != (uint64_t)(uint32_t)u, u & cst != 0, etc, I am trying to think
which one looks most canonical).

I expect in interesting cases the code will use z = (uint64_t)x * y twice, once
to check for overflow, and once as (uint32_t)z to get the actual result (or
there is a separate x*y and we want to CSE it with the overflow version).

[Bug tree-optimization/95001] std::terminate() and abort() do not have __builtin_unreachable() semantics

2020-05-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95001

--- Comment #3 from Marc Glisse  ---
Simpler example:

[[noreturn]] void theend();

int f(int x){
  if(x&7)theend();
  return x&3;
}

(or replace "theend()" with "throw 42")

We shouldn't compute x&3, it is always 0 in the branch where it is computed.

But this simple example would probably be doable similarly to VRP, while the
original example requires that the information be available during another pass
(on-demand non-zero bits, like there is work on on-demand ranges?).

[Bug c/95044] [10/11 Regression] -Wreturn-local-addr false alarm since r10-1741-gaac9480da1ffd037

2020-05-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95044

--- Comment #1 from Marc Glisse  ---
I think there is another very similar bug report.

  # buf_1 = PHI <&stack_buf(2), buf_15(6)>
[...]
  if (&stack_buf != buf_1)

in each branch, we thus know what buf_1 is, so we could replace it with buf_15
in

  # _3 = PHI <_17(5), buf_1(4)>
  return _3;

(or is that bad for register pressure?)

Or use it as a hint to thread that path, or add some logic to
Wreturn_local_addr, but that's getting more complicated.

[Bug libstdc++/95065] Remove std::bind1st and std::bind2nd when building in -std=c++17

2020-05-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95065

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Marc Glisse  ---
They are kept on purpose.

*** This bug has been marked as a duplicate of bug 91383 ***

[Bug libstdc++/91383] C++17 should remove some library feature deprecated in C++14

2020-05-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91383

Marc Glisse  changed:

   What|Removed |Added

 CC||gcc at linkmauve dot fr

--- Comment #6 from Marc Glisse  ---
*** Bug 95065 has been marked as a duplicate of this bug. ***

[Bug target/95115] [10 Regression] RISC-V 64: inf/inf division optimized out, invalid operation not raised

2020-05-13 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115

--- Comment #1 from Marc Glisse  ---
I am seeing the same thing on x86_64, happens during FRE1, so it looks like
tree-optimization.

[Bug target/95115] [10 Regression] RISC-V 64: inf/inf division optimized out, invalid operation not raised

2020-05-13 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115

--- Comment #2 from Marc Glisse  ---
Or during CCP with the simpler

double f(){
  double d=__builtin_inf();
  return d/d;
}

and all the -frounding-math -ftrapping-math -fsignaling-nans don't seem to
help.

[Bug target/95115] [10 Regression] RISC-V 64: inf/inf division optimized out, invalid operation not raised

2020-05-13 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115

--- Comment #4 from Marc Glisse  ---
(In reply to Jim Wilson from comment #3)
> The assumption here seems to be that if the user is
> dividing constants, then we don't need to worry about setting exception
> bits.  If I write (4.0 / 3.0) for instance, the compiler just folds it and
> doesn't worry about setting the inexact bit.

We don't fold 0./0. though (unless -fno-trapping-math), it would make sense for
Inf/Inf to behave the same. And -frounding-math protects 4./3.

> divide gets moved after the fetestexcept call.  That looks like a gcc bug

Yes, that one is known.

[Bug tree-optimization/95246] Failure to optimize comparison between differently signed chars

2020-05-20 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95246

--- Comment #1 from Marc Glisse  ---
On which version of LLVM did you see that? For me, gcc produces

movzbl  %dil, %edi
movsbl  %sil, %esi
cmpl%esi, %edi
setg%al

while clang skips the first 2 lines (but still emits movl), assuming that the
input is already signed/zero extended, which points at ABI conventions. The
transformation you suggest doesn't seem right to me.

[Bug c++/94141] c++20 rewritten operator== recursive call mixing friend and external operators for template class

2020-05-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94141

Marc Glisse  changed:

   What|Removed |Added

   Last reconfirmed||2020-05-21
 Status|UNCONFIRMED |SUSPENDED
 Ever confirmed|0   |1

--- Comment #3 from Marc Glisse  ---
It seems that this is as currently specified in C++20, but that some people are
going to try and change the rules to avoid breaking code like this.

[Bug sanitizer/95279] UBSan doesn't seem to detect pointer overflow in certain cases

2020-05-25 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95279

--- Comment #8 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #4)
> There is nothing wrong on addition of -1, whether signed or cast to
> size_t/uintptr_t, to a pointer,

Looking at the standard (I am not a pro at that), one could easily interpret
that p+(size_t)(-1) means adding a huge number to p, not subtracting 1. It does
not say that the integer is cast to ptrdiff_t or anything like that.

[Bug sanitizer/95279] UBSan doesn't seem to detect pointer overflow in certain cases

2020-05-25 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95279

--- Comment #12 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #10)
> 1 + (size_t) -1 give 0

It wasn't obvious to me that the operation was supposed to happen in some C/C++
type (they don't say which one) or in a mathematical, infinite-precision sense.
After all, they write 0≤i−j≤n which shouldn't be interpreted as C++ code. But
you have more experience with reading these things, I believe you.

[Bug c++/95351] Comparison with NAN optimizes incorrectly with -ffast-math disabled

2020-05-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95351

--- Comment #2 from Marc Glisse  ---
It might not be the issue, but merge_truthop_with_opposite_arm has a suspicious
HONOR_NANS (type) where type is bool: the result of the comparison instead of
one of the arguments.

[Bug middle-end/95353] [10/11 Regression] GCC can't build binutils

2020-05-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95353

--- Comment #3 from Marc Glisse  ---
Do you need fr_literal to have size at least 1 (say, when creating an object on
the stack), or can you use the official flexible array member (drop the 1, just
[] in the declaration)?

[Bug tree-optimization/95393] Failure to optimize loop condition arithmetic for mismatched types

2020-05-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95393

--- Comment #1 from Marc Glisse  ---
It does optimize for me with -O2 or -O3. It could optimize earlier though, by
the end of gimple, we are still trying to return max(s,0).

[Bug tree-optimization/95423] Failure to optimize separated multiplications by x and square of x

2020-05-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95423

--- Comment #3 from Marc Glisse  ---
We manage it with -fwrapv. This should happen late when we don't care about
overflow anymore, or it needs to introduce casts to an unsigned type.

[Bug tree-optimization/95433] Failure to completely optimize simple compare after operations

2020-05-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95433

Marc Glisse  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-05-30

--- Comment #1 from Marc Glisse  ---
It is 2*x==-2 that we fail to simplify. match.pd has code for x*2==y*2 or
x*2==0 or even x*2.==-2. for floats, but apparently not for the special case of
other constants for integers.

[Bug target/95435] bad builtin memcpy performance with znver1/znver2 and 32bit

2020-05-30 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95435

Marc Glisse  changed:

   What|Removed |Added

 Target||x86-*-*

--- Comment #3 from Marc Glisse  ---
"regression" means that a new version of gcc is working worse than an older
one, can you mention which older version you are comparing to and what results
you were getting with it?

[Bug tree-optimization/95489] Failure to optimize x && (x & y) to x & y

2020-06-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95489

--- Comment #2 from Marc Glisse  ---
(In reply to Richard Biener from comment #1)
>   (bit_and (ne (bit_and x_3 y_4) 0) (ne x_3 0))

This could be simplified

> where I'd say we miss
> 
>   (bit_and (ne @0 integer_zerop) (ne @1 integer_zerop))
> 
> ->
> 
>   (ne (bit_and @0 @1) integer_zerop)

This only seems possible for 1-bit types: 1!=0 & 2!0 is not (1&2)!=0

To me, this falls in the general category of (x!=a)?f(x):y where y happens to
be f(a) and f is not as costly as a condition+jump. I handled a few such cases
a while ago with neutral_element_p, but it could be much more general (I am not
saying it is easy).

[Bug c++/95384] Poor codegen cause by using base class instead of member for Optional construction

2020-06-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95384

--- Comment #2 from Marc Glisse  ---
Or with less templates:

struct A {
  A() = default;
  union S {
constexpr S() noexcept : e() { }
struct {} e;
int i;
  } s;
  bool b = false;
};
struct B : A {
  B() = default;
  using A::A;
};
B f() { return {}; }


movl$0, -12(%rsp)
movq-16(%rsp), %rax
ret

instead of a plain

xorl%eax, %eax

optimized dump is

  MEM  [(struct B *)&D.2265 + 4B] = {};
  MEM[(union S *)&D.2265] ={v} {CLOBBER};
  D.2275 = D.2265;
  D.2265 ={v} {CLOBBER};
  return D.2275;

expand

(insn 5 2 6 2 (set (mem/c:SI (plus:DI (reg/f:DI 77 virtual-stack-vars)
(const_int -12 [0xfff4])) [7 MEM  [(struct
B *)&D.2265 + 4B]+0 S4 A32])
(const_int 0 [0])) "o.cc":14:17 -1
 (nil))
(insn 6 5 7 2 (set (reg:DI 83)
(mem/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
(const_int -16 [0xfff0])) [7 D.2265+0 S8 A64]))
"o.cc":14:17 -1
 (nil))
(insn 7 6 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
(const_int -8 [0xfff8])) [7 D.2275+0 S8 A64])
(reg:DI 83)) "o.cc":14:17 -1
 (nil))
(insn 8 7 9 2 (set (reg:DI 84)
(const_int 0 [0])) "o.cc":14:17 -1
 (nil))
(insn 9 8 10 2 (set (reg:DI 84)
(mem/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
(const_int -8 [0xfff8])) [7 D.2275+0 S8 A64]))
"o.cc":14:17 -1
 (nil))
(insn 10 9 11 2 (set (reg:DI 85)
(reg:DI 84)) "o.cc":14:17 -1
 (nil))
(insn 11 10 15 2 (set (reg:DI 82 [  ])
(reg:DI 85)) "o.cc":14:17 -1
 (nil))
(insn 15 11 16 2 (set (reg/i:DI 0 ax)
(reg:DI 82 [  ])) "o.cc":14:20 -1
 (nil))
(insn 16 15 0 2 (use (reg/i:DI 0 ax)) "o.cc":14:20 -1
 (nil))

and dfinish

(insn:TI 5 2 15 2 (set (mem/c:SI (plus:DI (reg/f:DI 7 sp)
(const_int -12 [0xfff4])) [7 MEM  [(struct
B *)&D.2265 + 4B]+0 S4 A32])
(const_int 0 [0])) "o.cc":14:17 67 {*movsi_internal}
 (nil))
(insn 15 5 16 2 (set (reg/i:DI 0 ax)
(mem/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int -16 [0xfff0])) [7 D.2265+0 S8 A64]))
"o.cc":14:20 66 {*movdi_internal}
 (nil))
(insn 16 15 25 2 (use (reg/i:DI 0 ax)) "o.cc":14:20 -1
 (nil))

[Bug middle-end/4210] should not warn in dead code

2020-06-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210

--- Comment #43 from Marc Glisse  ---
(In reply to Niels Möller from comment #42)
> And what's the easiest way to run the the right compiler process (I guess
> that's cc1) under gdb?

gcc -c t.c -wrapper gdb,--args

[Bug libstdc++/95561] std::is_signed_v<__int128> is false

2020-06-06 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95561

--- Comment #1 from Marc Glisse  ---
Are you using -std=gnu++17 or -std=c++17 ?

[Bug tree-optimization/95643] Optimizer fails to realize that a variable tested twice in a row is the same both times

2020-06-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95643

--- Comment #1 from Marc Glisse  ---
After FRE1 we have

  _2 = x_9(D) == 0;
  if (_2 != 0)

so we assert things for _2 and not x_9, and we lose the __builtin_unreachable
information in CCP2.

[Bug libstdc++/90436] Redundant size checking in vector

2020-06-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90436

--- Comment #2 from Marc Glisse  ---
(writing down some notes)

Calling

  size_type
  _M_check_len_one(const char* __s) const
  {
if (max_size() - size() < 1)
  __throw_length_error(__N(__s));

const size_type __len = size() + (std::max)(size(), (size_t)1);
return (__len > max_size()) ? max_size() : __len;
  }

instead of _M_check_len reduces the running time of this micro-benchmark

#include 
int main(){
  volatile int a=0;
  for(int i=0;i<100;++i){
std::vector v;
for(int j=0;j<1000;++j){
  v.push_back(j);
}
a=v[a];
  }
}

from .88s to .66s at -O3. Two key elements (the perf gain only comes if we do
both) are removing the overflow check, and having the comparison between size
and max_size optimized to be done on byte length (not divided by the element
size).

I think the overflow check could be removed from the normal _M_check_len: we
have already checked that max_size() - size() >= __n so size() + __n cannot
overflow, and size() must be smaller than max_size(), which should be at most
SIZE_MAX/2, at least if ptrdiff_t and size_t have the same size, so size() +
size() cannot overflow either.
I should check if the compiler could help more. It is supposed to know how to
optimize .ADD_OVERFLOW based on the range of the operands.

I suspect that a single_use restriction explains why max_size() == size()
compares values without division while max_size() - size() < __n (for __n = 1)
doesn't.

[Bug libstdc++/90436] Redundant size checking in vector

2020-06-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90436

--- Comment #3 from Marc Glisse  ---
  // possibly assumes that ptrdiff_t and size_t have the same size
  size_type
  _M_check_len_one(const char* __str) const
  {
ptrdiff_t __n = sizeof(_Tp);
ptrdiff_t __ms = max_size(); __ms *= sizeof(_Tp);
ptrdiff_t __s = size(); __s *= sizeof(_Tp);
if (__s > (__ms - __n))
  __throw_length_error(__N(__str));

const ptrdiff_t __len = __s + (std::max)(__s, __n);
if (__len <= 0) __builtin_unreachable();
ptrdiff_t __ret = (std::min)(__len, __ms);
return (_Tp*)__ret-(_Tp*)0; // hack to generate divexact, so it
simplifies with * sizeof(_Tp)
  }

generates nicer code. But after those experiments, it seems clear that the
performance of this code is irrelevant (not surprising since it is followed by
a call to operator new), and its effect on global performance is random.
Possibly it causes something to get aligned differently, which can randomly get
this 25% speed-up, but can just as randomly go back to the slow version.
Anyway, I don't think I'll be submitting any patch for this.

[Bug libstdc++/90436] Redundant size checking in vector

2020-06-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90436

--- Comment #4 from Marc Glisse  ---
(side note not related to the redundant size checking)
It is surprising how, in the code from comment 2, adding v.reserve(1000) does
not help, it even slows the program down slightly here (yes, that's rather hard
to believe). To reap the benefits, I also need to add in the loop:
if(v.size()==v.capacity())__builtin_abort();
which enables the compiler to remove the reallocation code, and once that code
is removed it can actually prove that size never reaches capacity and remove
the call to abort! We don't even need __builtin_unreachable there. And once all
that dead code is removed, it can finally vectorize.

[Bug tree-optimization/95801] Optimiser does not exploit the fact that an integer divisor cannot be zero

2020-06-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95801

--- Comment #1 from Marc Glisse  ---
Except when dereferencing a pointer (?), gcc seldom uses an operation to derive
properties on the operands, it mostly derives properties on the result. That's
in large part because the information you are getting on the operands is only
valid in some regions, not for the whole life of the SSA_NAME (in if(y!=0)x/y;
the division obviously doesn't allow to remove the earlier test for y!=0)
There could be many cases:
x/y => y is not 0
i+1 => i is not INT_MAX
x/[ex]4 => the last 2 bits of x are 0
ptr+n or *ptr => ptr is not a null pointer

There is code in isolate-path to handle operands that are potentially 0, but I
think that's only when we see x / PHI, not for a "normal" divisor.

VRP works around the issue by creating extra SSA_NAMEs for the regions where we
know more about a variable, but it only does it for branches like if(x<10),
doing it for the operands of every operation would be too costly.

[Bug c/95818] wrong "used uninitialized" warning

2020-06-22 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95818

--- Comment #3 from Marc Glisse  ---
Richard said "complete", that is the whole .i file, not just one random
function. If we cannot reproduce the issue by copying your code and compiling
it, we can't do anything about your report.

[Bug tree-optimization/95906] Failure to recognize max pattern with mask

2020-06-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95906

--- Comment #1 from Marc Glisse  ---
I'd say generate a (vec_)cond_expr, not directly a max. That is, replace the
comparison with any truth_valued_p (hmm, that function probably stopped working
for vectors when all comparisons were wrapped in vec_cond for avx512).

[Bug tree-optimization/95663] static_cast checks for null even when the pointer is dereferenced

2020-06-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95663

--- Comment #12 from Marc Glisse  ---
(In reply to Jeffrey A. Law from comment #10)
> __builtin_trap emits an actual trap into the instruction stream which halts
> the process immediately which is *much* better from a security standpoint

Regardless of what the default is, I think we should be able to agree that
there are uses where we want to favor hardening/security (public facing
servers, web browsers), and others where performance is more important
(scientific simulations), and it would be nice to give users a choice.
(I think sanitizers already provide a way to turn __builtin_unreachable into
__builtin_trap, but that's more meant for explicit __builtin_unreachable in
user code)

[Bug tree-optimization/95926] Failure to optimize xor pattern when using temporary variable

2020-06-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95926

--- Comment #1 from Marc Glisse  ---
It is different to gcc because in the first case, tmp is used twice, while in
the second case, each a&b is only used once, and gcc only transforms (a&b)^b to
b&~a if this is the only use of a&b. Yes, this heuristic often backfires, but
as long as we consider &~ as 2 operations, not restricting the transformation
could generate worse code in some cases.

[Bug tree-optimization/95924] Failure to optimize some bit magic to one of the operands

2020-06-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95924

--- Comment #1 from Marc Glisse  ---
* If I replace ~a with !a, we manage to do everything with type bool. With ~a,
we don't, we stick to int.

* We don't handle a?b:false the same as a&&b.

* Even for (a | !b) && (!(!a & b) && a) we don't completely simplify, because
that would be replacing too many && with & (I think). If I manually replace one
&& with &, gcc manages.

[Bug tree-optimization/95929] Failure to optimize tautological comparisons of comparisons to a single one

2020-06-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95929

--- Comment #1 from Marc Glisse  ---
Here gcc does optimize the first f to (a != 0) ^ (b != 0). However, for the
second f, it does indeed generate something that looks like the first f before
optimization... The optimization for the first f is probably "(X && !Y) || (!X
&& Y) is X ^ Y" in fold-const.c, which may not have an equivalent in match.pd
yet.

[Bug tree-optimization/95923] Failure to optimize bool checks into and

2020-06-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95923

--- Comment #1 from Marc Glisse  ---
(With one variant I ended up with (a|b)&(a==b), which we don't optimize to a&b)

We don't optimize !(!a && !b) && !(!a && b) && !(a && !b) (we keep several
branches), but we do optimize if I manually replace enough && with &.

[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value

2020-06-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971

--- Comment #11 from Marc Glisse  ---
while(!a.isZero());

that doesn't look like something you would find in real code. Are you waiting
for a different thread to modify a? Then you should use an atomic operation.
Are you waiting for the hardware to change something? Use volatile. Do you
really want an infinite loop? Spell it out if(!a.isZero())for(;;);

[Bug tree-optimization/96009] missed optimization with floating point operations and integer literals

2020-07-01 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96009

--- Comment #2 from Marc Glisse  ---
Note that we don't do the optimization if you replace double with long either.

[Bug c++/96065] Move elision of returned automatic variable doesn't happen the variable is enclosed in a block

2020-07-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96065

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Marc Glisse  ---
dup

*** This bug has been marked as a duplicate of bug 51571 ***

[Bug c++/51571] No named return value optimization while adding a dummy scope

2020-07-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51571

Marc Glisse  changed:

   What|Removed |Added

 CC||b7.10110111 at gmail dot com

--- Comment #6 from Marc Glisse  ---
*** Bug 96065 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/96108] Different behavior in DSE pass

2020-07-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96108

--- Comment #4 from Marc Glisse  ---
During optimization, we often have branches with dead code that would exhibit
UB if it was ever executed. Cleaning up those branches as much as possible
helps reduce code size, show that some variables (in the live part of the code)
are constant, etc.

If we see *p=x where p is uninitialized, it doesn't serve much purpose to keep
it as is, we might as well replace it with *0=0 or trap/unreachable depending
on options.

[Bug c++/96121] Uninitialized variable copying not diagnosed

2020-07-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121

--- Comment #2 from Marc Glisse  ---
gcc warns for this at the level of actual instructions, not user code. Since A
is empty, nothing uninitialized is getting copied.

[Bug c++/96121] Uninitialized variable copying not diagnosed

2020-07-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121

--- Comment #3 from Marc Glisse  ---
And this translation unit doesn't actually generate any code at all, so the way
the warning is currently implemented has no chance of even looking at it.

[Bug c++/96121] Uninitialized variable copying not diagnosed

2020-07-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96121

--- Comment #5 from Marc Glisse  ---
Yes, then we are back to the fact that it works for A=int but not for A a class
containing an int.

[Bug libstdc++/96088] Range insertion into unordered_map is less effective than a loop with insertion

2020-07-09 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96088

--- Comment #3 from Marc Glisse  ---

(In reply to Jonathan Wakely from comment #2)
> Or use unordered_map, equal_to<>> which
> should perform better.

Good idea.

> We haven't implemented http://wg21.link/p0919r3 and http://wg21.link/p1690r1
> yet, I wonder if those would help, especially if we make the internal
> helpers available pre-C++20. That could allow the range insertion to use the
> heteregenous lookup, to avoid creating temporaries. I'm not sure if that
> would be conforming though. Heterogeneous lookup is observably different,
> and not conforming in C++17.

Restricting it to a few standard types like string should not be observable.

> Adding hash::operator()(string_view) is an interesting idea for the
> standard though.

Indeed. If we want to, I think it is possible to add some overloads for when
the argument is exactly const char* or string_view, which should remain
conforming and provide a significant part of the benefits.

[Bug tree-optimization/96369] Wrong evaluation order of || operator

2020-07-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96369

Marc Glisse  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-07-29
   Keywords||wrong-code
  Component|c   |tree-optimization
 Status|UNCONFIRMED |NEW

[Bug target/96327] Inefficient increment through pointer to volatile on x86

2020-07-30 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96327

--- Comment #5 from Marc Glisse  ---
I don't think bug 3506 has been fixed (its status seems wrong to me). But don't
worry, there are several other duplicates that still have status NEW (bug 50677
for instance).
This is a sensible enhancement request, I think some gcc backends already do a
similar optimization, it simply isn't a priority, because volatile almost means
"don't optimize this".
At least the difference between the gcc and clang codes matches those other
PRs. Not sure why you are talking of address computations.

[Bug tree-optimization/96392] Optimize x+0.0 if x is an integer

2020-07-30 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96392

Marc Glisse  changed:

   What|Removed |Added

   Last reconfirmed||2020-07-30
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Severity|normal  |enhancement

[Bug tree-optimization/95433] Failure to completely optimize simple compare after operations

2020-08-01 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95433

--- Comment #5 from Marc Glisse  ---
Patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551154.html for the
original testcase.

Note that solving univariate polynomial equations *in the integers* (the
rationals are not much harder) is actually rather simple, just enumerate the
divisors of the constant term and evaluate the polynomial on each of them to
check which ones are roots. If someone wants to implement that...

[Bug middle-end/96426] New: __builtin_convertvector ICE without lhs

2020-08-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96426

Bug ID: 96426
   Summary: __builtin_convertvector ICE without lhs
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

typedef long long veci __attribute__((vector_size(16)));
typedef double vecf __attribute__((vector_size(16)));
void f(veci v){
  __builtin_convertvector(v,vecf);
}

$ gcc u.c
during GIMPLE pass: veclower
u.c: In function 'f':
u.c:3:6: internal compiler error: Segmentation fault
3 | void f(veci v){
  |  ^

Found by inspection, I wanted to introduce another conversion-like IFN (not
pure) and didn't know where to store the result type in case the lhs is
missing. Here the easiest would be to skip the stmt if there is no lhs, even if
that's not perfect for -O0, but I am also interested in how to handle the
non-pure case, if you have ideas...

[Bug tree-optimization/96433] Failed to optimize (A / N) * N <= A

2020-08-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96433

Marc Glisse  changed:

   What|Removed |Added

  Component|c   |tree-optimization
   Last reconfirmed||2020-08-03
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
Version|tree-ssa|11.0

--- Comment #2 from Marc Glisse  ---
Replacing (A/B)*B with A-A%B makes the transformation a bit simpler (and the
signed variants with ceil/floor), but we probably don't want to do that
transformation all the time (?). Which leaves the rather specialized (A/B)*B
cmp A...

[Bug target/70314] AVX512 not using kandw to combine comparison results

2020-08-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314

--- Comment #4 from Marc Glisse  ---
We now generate for the original testcase

vpcmpd  $1, %zmm3, %zmm2, %k1
vpcmpd  $1, %zmm1, %zmm0, %k0{%k1}
vpmovm2d%k0, %zmm0

which looks great.

However, using | instead of &, we get

vpcmpd  $1, %zmm1, %zmm0, %k0
vpcmpd  $1, %zmm3, %zmm2, %k1
kmovw   %k0, %eax
kmovw   %k1, %edx
orl %edx, %eax
kmovw   %eax, %k2
vpmovm2d%k2, %zmm0

Well, at least gimple did what it could, and it is now up to the target to
handle logical operations on bool vectors / k* registers. There is probably
already another bug report about that...

[Bug tree-optimization/95906] Failure to recognize max pattern with mask

2020-08-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95906

--- Comment #3 from Marc Glisse  ---
With the patch (which only affects vectors), f becomes (a>b)?a:b. It should be
easy to add the corresponding transform to MAX_EXPR in match.pd.

[Bug middle-end/88670] [meta-bug] generic vector extension issues

2020-08-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88670
Bug 88670 depends on bug 70314, which changed state.

Bug 70314 Summary: AVX512 not using kandw to combine comparison results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/70314] AVX512 not using kandw to combine comparison results

2020-08-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70314

Marc Glisse  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Marc Glisse  ---
Ok, there are enough duplicates for that part, this particular bug report was
mostly about the gimple part, which is fixed now. Closing.

[Bug tree-optimization/96513] building terminated with -O3

2020-08-06 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96513

--- Comment #1 from Marc Glisse  ---
What you could do, even if it is private code, is reduce it
(https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction) until it is very small
and doesn't give away any IP, and then post it. Otherwise, there is not much we
can do from your report and we are probably just going to close it...

[Bug target/96528] New: [11 Regression] vector comparisons on ARM

2020-08-07 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96528

Bug ID: 96528
   Summary: [11 Regression] vector comparisons on ARM
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
Target: arm-none-linux-gnueabihf

(see the discussion after
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551468.html )

I am using a compiler configured with --target=arm-none-linux-gnueabihf
--with-float=hard --with-cpu=cortex-a9 --with-fpu=neon-fp16

typedef unsigned int vec __attribute__((vector_size(16)));
typedef int vi __attribute__((vector_size(16)));
vi f(vec a,vec b){
return a==5 | b==7;
}

Compiling with -O yields very long scalar code. Adding -fno-tree-forwprop gets
back the nice, vector code. (at higher optimization levels, one may also need
to disable vrp)

This is due to the fact that while the ARM target handles VEC_COND_EXPR just fine, it does not handle a plain v == w that is not fed directly to
a VEC_COND_EXPR. I was surprised to notice that "grep vec_cmp" gives a number
of lines in the aarch64/ directory, but none in arm/, while AFAIK those neon
instructions are the same. Would it be possible to implement this on ARM as
well? Other middle-end options are also possible, but the difference with
aarch64 makes it tempting to handle it in the target.

[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable

2020-08-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542

--- Comment #2 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #1)
> In bar, this is optimized, because fold_binary_op_with_conditional_arg
> optimizes
> 255 >> (x ? 1 : 0) into x ? 127 : 255 and when multiplied by two in unsigned
> char this results in x ? 254 : 254.
> We don't have anything comparable in match.pd yet I believe (and should we?).

We have something for VEC_COND_EXPR to fold a op (b?c:d), but not for
COND_EXPR, which you would be unlikely to see in gimple (and the generator of
phiopt transforms from match.pd patterns hasn't appeared yet). Also, we only
have x!=0, and while fold_binary_op_with_conditional_arg tries to handle it
like x!=0?1:0, we indeed don't do anything like that for gimple. And it seems
possibly better suited to forward propagation than backward like match.pd.

> Or shall say VRP try harder if it sees [0, 1] ranges?

If a range has only 2 (or some other small number) values, try propagating each
and see if some variables end up with the same value in both cases? Or if
enough simplifications occur that it is worth introducing a conditional? I am
not sure it would be worth the trouble.

> Though, shouldn't we optimize e.g.
> unsigned
> baz (unsigned int x)
> {
>   if (x >= 4) return 32;
>   return (-1U >> x) * 16;
> }
> too to return x >= 4 ? 32U : -16U; ?
> Not sure where and how to generalize it though.
> Value range of -1U >> [0, 3] is not really useful here, nonzero bits either.
> And having a specialized (const1 >> x) * const2 optimizer based on x's value
> range would work, but not sure if it has a real-world benefit.

And here this is complicated by the fact that we do not narrow the operation,
so it is less obvious that the constant is -1.

[Bug c/96550] gcc is smart in figuring out a non-returning function.

2020-08-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96550

--- Comment #1 from Marc Glisse  ---
Does -fno-delete-null-pointer-checks help?

[Bug target/50829] avx extra copy for _mm256_insertf128_pd

2020-08-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50829

Marc Glisse  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||10.1.0
 Resolution|--- |FIXED
  Known to fail||9.3.0

--- Comment #14 from Marc Glisse  ---
This was fixed (by Jakub I think).

[Bug rtl-optimization/48037] Missed optimization: unnecessary register moves

2020-08-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48037

--- Comment #10 from Marc Glisse  ---
We now generate just
sqrtpd  %xmm0, %xmm0
for 2 and 4,
sqrtpd  (%rdi), %xmm0
for 3, and
movupd  (%rdi), %xmm0
sqrtpd  %xmm0, %xmm0
for 1 (for alignment reasons I guess, the movu disappears with -mavx).

Should we close it as fixed? Or update the testcase to perform a different
operation that isn't recognized?

[Bug tree-optimization/96563] Failure to optimize loop with condition to simple arithmetic

2020-08-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96563

Marc Glisse  changed:

   What|Removed |Added

   Last reconfirmed||2020-08-11
 Ever confirmed|0   |1
   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization

--- Comment #1 from Marc Glisse  ---
At -O2 (before uncprop)

   [local count: 151290756]:

   [local count: 976138698]:
  # i_9 = PHI <0(2), i_4(6)>
  if (x_3(D) == i_9)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 922451069]:
  i_4 = i_9 + 1;
  if (i_4 != 10)
goto ; [89.42%]
  else
goto ; [10.58%]

   [local count: 97603126]:
  goto ; [100.00%]

   [local count: 824847943]:
  goto ; [100.00%]

   [local count: 53687628]:

   [local count: 151290756]:
  # _2 = PHI <8(7), 4(8)>
  return _2;

We don't really do anything special. At -O3, the loop gets unrolled

   [local count: 151290757]:
  if (x_3(D) == 0)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 8320992]:
  goto ; [100.00%]

   [local count: 142969766]:
  if (x_3(D) == 1)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 7863337]:
  goto ; [100.00%]

   [local count: 135106428]:
  if (x_3(D) == 2)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 7430853]:
  goto ; [100.00%]

   [local count: 127675576]:
  if (x_3(D) == 3)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 7022157]:
  goto ; [100.00%]

   [local count: 120653419]:
  if (x_3(D) == 4)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 6635938]:
  goto ; [100.00%]

   [local count: 114017483]:
  if (x_3(D) == 5)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 6270962]:
  goto ; [100.00%]

   [local count: 107746521]:
  if (x_3(D) == 6)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 5926059]:
  goto ; [100.00%]

   [local count: 101820460]:
  if (x_3(D) == 7)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 5600125]:
  goto ; [100.00%]

   [local count: 96220334]:
  if (x_3(D) == 8)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 5292118]:
  goto ; [100.00%]

   [local count: 90928219]:
  if (x_3(D) == 9)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 5001052]:
  goto ; [100.00%]

   [local count: 97603126]:

   [local count: 151290756]:
  # _2 = PHI <8(14), 4(12), 8(22), 8(21), 8(20), 8(19), 8(18), 8(17), 8(16),
8(15), 8(23)>
  return _2;

We have code in reassoc to handle x==0||x==1||x==2 and turn it into a range
test. I suspect the issue is related to those empty bb between the condition
and the PHI that hides the fact that they are jumping to the same place
eventually. Fixing this for the unrolled case is thus probably easiest,
although of course it would be nice if it also worked for 99 instead of 9,
where we are not going to unroll.

  1   2   3   4   5   6   7   8   9   10   >