[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084

--- Comment #3 from Jakub Jelinek  ---
Bet the associate code is really unprepared to have unfolded trees around,
which hasn't been the case before delayed folding has been introduced to C and
C++ FEs.
Unfortunately it isn't complete, because e.g. convert_to_integer_1 -> do_narrow
-> fold_build2_loc happily folds.

Anyway, quick fix could be not trying to reassociate TREE_CONSTANT parts:
--- gcc/fold-const.cc.jj2024-01-26 00:07:58.0 +0100
+++ gcc/fold-const.cc   2024-02-24 09:38:40.150808529 +0100
@@ -908,6 +908,8 @@ split_tree (tree in, tree type, enum tre
   if (TREE_CODE (in) == INTEGER_CST || TREE_CODE (in) == REAL_CST
   || TREE_CODE (in) == FIXED_CST)
 *litp = in;
+  else if (TREE_CONSTANT (in))
+*conp = in;
   else if (TREE_CODE (in) == code
   || ((! FLOAT_TYPE_P (TREE_TYPE (in)) || flag_associative_math)
   && ! SAT_FIXED_POINT_TYPE_P (TREE_TYPE (in))
@@ -956,8 +958,6 @@ split_tree (tree in, tree type, enum tre
   if (neg_var_p && var)
*minus_varp = var, var = 0;
 }
-  else if (TREE_CONSTANT (in))
-*conp = in;
   else if (TREE_CODE (in) == BIT_NOT_EXPR
   && code == PLUS_EXPR)
 {

So, the problem happens on
typedef unsigned _BitInt (__SIZEOF_INT__ * __CHAR_BIT__ - 1) T;
T a, b;

void
foo (void)
{
  b = (T) ((a | (-1U >> 1)) >> 1 | (a | 5) << 4);
}
when fold_binary_loc is called on (unsigned _BitInt(31)) a << 4 | 80 and
(unsigned _BitInt(31)) (2147483647 >> 1), but the important part is that
the op0 has the unsigned _BitInt(31) type, while op1 is NOP_EXPR to that type
from
RSHIFT_EXPR done on T type (the typedef).
Soon BIT_IOR_EXPR folding is called on
(unsigned _BitInt(31)) a << 4 and 2147483647 >> 1 | 80 where the latter is all
in T type (fold_binary_loc does STRIP_NOPS).  Because split_tree prefers same
code over TREE_CONSTANT, this splits it into the LSHIFT_EXPR var0, RSHIFT_EXPR
con1 (because it is TREE_CONSTANT) and the T type 80 literal in lit1,
everything else is NULL.  As there are 3 objects, it reassociates.  We first
associate_tree the 0 vs. 1 cases, but that just moves the *1 into *0 because
their counterparts are NULL.
Both the RSHIFT_EXPR and INTEGER_CST 80 have T type but atype is the
build_bitint_type
non-typedef type, so
11835 /* Eliminate lit0 and minus_lit0 to con0 and minus_con0.
*/
11836 con0 = associate_trees (loc, con0, lit0, code, atype);
returns NOP_EXPR of the RSHIFT_EXPR | INTEGER_CST.
And then we associate_trees the LSHIFT_EXPR with this result and so it recurses
infinitely.

Perhaps my above patch is an improvement, if we know some subtree is
TREE_CONSTANT, all we need is just wait for it to be constant folded (not sure
it would always do e.g. because of division by zero or similar) trying to
reassociate its parts with other expressions might just split the constants to
other spots instead of keeping it together.

[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084

--- Comment #4 from Jakub Jelinek  ---
Though, I must say not really sure why this wouldn't recurse infinitely even
without the casts.

[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084

--- Comment #5 from Jakub Jelinek  ---
Or perhaps the
  if (ok
  && ((var0 != 0) + (var1 != 0)
  + (minus_var0 != 0) + (minus_var1 != 0)
  + (con0 != 0) + (con1 != 0)
  + (minus_con0 != 0) + (minus_con1 != 0)
  + (lit0 != 0) + (lit1 != 0)
  + (minus_lit0 != 0) + (minus_lit1 != 0)) > 2)
condition should be amended to avoid the reassociation in cases where clearly
nothing good can come out of that.  Which is if the association actually
doesn't reshuffle anything.  (var0 == 0) || (var1 == 0) && (and similarly for
the other 5 pairs) and
(ignoring the minus_* stuff that would need more thoughts on it)
(con0 != 0 && lit0 != 0) || (con1 != 0 && lit1 != 0),
then it reassociates to the original stuff in op0 and original stuff in op1,
no change.  But how the minus_* plays together with this is harder.
Perhaps if lazy we could have a bool var whether there has been any association
between subtrees from original op0 and op1, initially set to false and set if
we associate_trees between something that comes from op0 and op1, and only do
the final
associate_trees if that is the case, because if not, it should be folding of
the individual suboperands, not reassociation.

[Bug target/114083] Possible word play on conditional/unconditional

2024-02-24 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114083

--- Comment #5 from Andreas Schwab  ---
Enable conditional-move operations even if unsupported by hardware.

[Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Bug ID: 114086
   Summary: Boolean switches could have a lot better codegen,
possibly utilizing bit-vectors
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: janschultke at googlemail dot com
  Target Milestone: ---

https://godbolt.org/z/3acqbbn3E

enum struct E { a, b, c, d, e, f, g, h };

bool test_switch(E e) {
switch (e) {
case E::a:
case E::c:
case E::e:
case E::g: return true;
default: return false;
}
}


Expected output
===

test_switch(E):
  mov eax, edi
  and eax, 1
  ret



Actual output (-O3)
===

test_switch(E):
  xor eax, eax
  cmp edi, 6
  ja .L1
  mov eax, 85
  bt rax, rdi
  setc al
.L1:
  ret


Explanation
===

Boolean switches in general can be optimized a lot better than what GCC
currently does. Clang does find the optimization to a bitwise AND, although
this may be a big ask.

Generally, contiguous boolean switches (that is, switch statements where all
cases yield a boolean value and the labels are contiguous) can be optimized to
accessing a bit vector.

That switch could have been transformed into:

> return 0b01010101 >> int(e);

[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
  mov eax, edi
  and eax, 1
  ret
seems wrong without -fstrict-enums, one could call test_switch(static_cast 
(9))
and it should return false in that case.

[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084

--- Comment #6 from Jakub Jelinek  ---
As in the following patch, which is supposed to track the origin of the 6
something0
variables in bitmasks, bit 1 means it comes (partly) from op0, bit 2 means it
comes (partly) from op1.
--- gcc/fold-const.cc.jj2024-02-24 09:49:09.098815803 +0100
+++ gcc/fold-const.cc   2024-02-24 11:01:34.266513041 +0100
@@ -11779,6 +11779,15 @@ fold_binary_loc (location_t loc, enum tr
  + (lit0 != 0) + (lit1 != 0)
  + (minus_lit0 != 0) + (minus_lit1 != 0)) > 2)
{
+ int var0_origin = (var0 != 0) + 2 * (var1 != 0);
+ int minus_var0_origin
+   = (minus_var0 != 0) + 2 * (minus_var1 != 0);
+ int con0_origin = (con0 != 0) + 2 * (con1 != 0);
+ int minus_con0_origin
+   = (minus_con0 != 0) + 2 * (minus_con1 != 0);
+ int lit0_origin = (lit0 != 0) + 2 * (lit1 != 0);
+ int minus_lit0_origin
+   = (minus_lit0 != 0) + 2 * (minus_lit1 != 0);
  var0 = associate_trees (loc, var0, var1, code, atype);
  minus_var0 = associate_trees (loc, minus_var0, minus_var1,
code, atype);
@@ -11791,15 +11800,19 @@ fold_binary_loc (location_t loc, enum tr

  if (minus_var0 && var0)
{
+ var0_origin |= minus_var0_origin;
  var0 = associate_trees (loc, var0, minus_var0,
  MINUS_EXPR, atype);
  minus_var0 = 0;
+ minus_var0_origin = 0;
}
  if (minus_con0 && con0)
{
+ con0_origin |= minus_con0_origin;
  con0 = associate_trees (loc, con0, minus_con0,
  MINUS_EXPR, atype);
  minus_con0 = 0;
+ minus_con0_origin = 0;
}

  /* Preserve the MINUS_EXPR if the negative part of the literal is
@@ -11815,15 +11828,19 @@ fold_binary_loc (location_t loc, enum tr
  /* But avoid ending up with only negated parts.  */
  && (var0 || con0))
{
+ minus_lit0_origin |= lit0_origin;
  minus_lit0 = associate_trees (loc, minus_lit0, lit0,
MINUS_EXPR, atype);
  lit0 = 0;
+ lit0_origin = 0;
}
  else
{
+ lit0_origin |= minus_lit0_origin;
  lit0 = associate_trees (loc, lit0, minus_lit0,
  MINUS_EXPR, atype);
  minus_lit0 = 0;
+ minus_lit0_origin = 0;
}
}

@@ -11833,37 +11850,51 @@ fold_binary_loc (location_t loc, enum tr
return NULL_TREE;

  /* Eliminate lit0 and minus_lit0 to con0 and minus_con0. */
+ con0_origin |= lit0_origin;
  con0 = associate_trees (loc, con0, lit0, code, atype);
- lit0 = 0;
+ minus_con0_origin |= minus_lit0_origin;
  minus_con0 = associate_trees (loc, minus_con0, minus_lit0,
code, atype);
- minus_lit0 = 0;

  /* Eliminate minus_con0.  */
  if (minus_con0)
{
  if (con0)
-   con0 = associate_trees (loc, con0, minus_con0,
-   MINUS_EXPR, atype);
+   {
+ con0_origin |= minus_con0_origin;
+ con0 = associate_trees (loc, con0, minus_con0,
+ MINUS_EXPR, atype);
+   }
  else if (var0)
-   var0 = associate_trees (loc, var0, minus_con0,
-   MINUS_EXPR, atype);
+   {
+ var0_origin |= minus_con0_origin;
+ var0 = associate_trees (loc, var0, minus_con0,
+ MINUS_EXPR, atype);
+   }
  else
gcc_unreachable ();
- minus_con0 = 0;
}

  /* Eliminate minus_var0.  */
  if (minus_var0)
{
  if (con0)
-   con0 = associate_trees (loc, con0, minus_var0,
-   MINUS_EXPR, atype);
+   {
+ con0_origin |= minus_var0_origin;
+ con0 = associate_trees (loc, con0, minus_var0,
+ MINUS_EXPR, atype);
+   }
  else
g

[Bug rtl-optimization/114085] Internal (cross) compiler error when building libstdc++ for the H8/300 family

2024-02-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114085

Jonathan Wakely  changed:

   What|Removed |Added

  Component|libstdc++   |rtl-optimization

--- Comment #1 from Jonathan Wakely  ---
If the compiler crashes then that's a compiler bug, not a library bug.

Reassigning to rtl-optimization but that might not be accurate.

[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #2 from Jan Schultke  ---
Yeah right, the actual optimal output (which clang finds) is:

> test_switch(E):
>   test edi, -7
>   sete al
>   ret


Testing with -7 also makes sure that the 8-bit and greater are all zero.

[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #3 from Jakub Jelinek  ---
And the rest boils down to what code to generate for
bool
foo (int x)
{
  return ((682 >> x) & 1);
}
Both that and switch from the #c0 testcase boil down to
  _1 = 682 >> x_2(D);
  _3 = (_Bool) _1;
or
  _6 = 682 >> _4;
  _8 = (_Bool) _6;
in GIMPLE dump.  Now, for the foo above, gcc emits
movl$682, %eax
btl %edi, %eax
setc%al
ret
and clang emits the same:
movl$682, %eax  # imm = 0x2AA
btl %edi, %eax
setb%al
retq
Though, e.g. clang 14 emitted
movl%edi, %ecx
movl$682, %eax  # imm = 0x2AA
shrl%cl, %eax
andb$1, %al
retq
which is longer, dunno what is faster.

[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #4 from Jakub Jelinek  ---
But sure, confirmed for both:

int
foo (int e)
{
  switch (e)
{
case 1:
case 3:
case 5:
case 7:
case 9:
case 11:
case 13:
  return 1;
default:
  return 0;
}
}

int
bar (int e)
{
  switch (e)
{
case 1:
case 3:
case 5:
case 7:
case 9:
case 11:
case 13:
case 15:
  return 1;
default:
  return 0;
}
}

where in foo because we emit the guarding
cmpl$13, %edi
ja  .L1
we could just simplify it to andl $1 when <= 13, and the bar case indeed can be
done
by (e & -15) != 0;
Now, the question is if either of these optimizations should be done in the
switch lowering, or if we should do it elsewhere where it would optimize also
hand written code like that, if user writes it as
int
foo2 (int e)
{
  if (e <= 13U)
return (10922 >> e) & 1;
  else
return 0;
}

int
bar2 (int e)
{
  if (e <= 15U)
return (43690 >> e) & 1;
  else
return 0;
}
Looking at clang, it can optimize bar, it can't optimize foo (uses switch table
rather than shift, that is worse than what gcc emits).  And emits pretty much
what gcc emits for foo2/bar2.
Perhaps phiopt could handle this for the bar2 case and match.pd using range
info for foo2?
Next question is what should be done if the 2 values aren't 1 and 0, but 0 and
1, or
some cst and cst + 1 or cst and cst - 1 for some arbitrary constant cst, or cst
and 0,
or 0 and cst or cst1 and cst2, whether to emit e.g. a conditional move etc.

[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #5 from Jan Schultke  ---
Well, it's not quite equivalent to either of the bit-shifts we've posted. To
account for shifting more than the operand size, it would be:

bool foo (int x)
{
  return x > 6 ? 0 : ((85 >> x) & 1);
}


This is exactly what GCC does and the branch can be explained by this range
check.

So I guess GCC already does optimize this to a bit-vector, it just doesn't find
the optimization to:

bool foo(int x)
{
return (x & -7) == 0;
}


This is very specific to this particular switch statement though. You could do
better than having a branch if the hardware supported a saturating shift, but
probably not on x86_64.

Nevermind that; if anything, this isn't middle-end.

[Bug middle-end/114073] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5530

2024-02-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114073

--- Comment #2 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:5e7a176e88a2a37434cef9b1b6a37a4f8274854a

commit r14-9163-g5e7a176e88a2a37434cef9b1b6a37a4f8274854a
Author: Jakub Jelinek 
Date:   Sat Feb 24 12:44:34 2024 +0100

bitint: Handle VIEW_CONVERT_EXPRs between large/huge BITINT_TYPEs and
VECTOR/COMPLEX_TYPE etc. [PR114073]

The following patch implements support for VIEW_CONVERT_EXPRs from/to
large/huge _BitInt to/from vector or complex types or anything else but
integral/pointer types which doesn't need to live in memory.

2024-02-24  Jakub Jelinek  

PR middle-end/114073
* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Handle
VIEW_CONVERT_EXPRs between large/huge _BitInt and
non-integer/pointer
types like vector or complex types.
(gimple_lower_bitint): Don't merge VIEW_CONVERT_EXPRs to
non-integral
types.  Fix up VIEW_CONVERT_EXPR handling.  Allow merging
VIEW_CONVERT_EXPR from non-integral/pointer types with a store.

* gcc.dg/bitint-93.c: New test.

[Bug middle-end/114073] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5530

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114073

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jakub Jelinek  ---
Fixed.

[Bug middle-end/113988] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5470

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113988
Bug 113988 depends on bug 114073, which changed state.

Bug 114073 Summary: during GIMPLE pass: bitintlower: internal compiler error: 
in lower_stmt, at gimple-lower-bitint.cc:5530
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114073

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Jakub Jelinek  changed:

   What|Removed |Added

  Component|middle-end  |tree-optimization

--- Comment #6 from Jakub Jelinek  ---
(In reply to Jan Schultke from comment #5)
> Well, it's not quite equivalent to either of the bit-shifts we've posted.

The #c4 foo2/bar2 are functionally equivalent to #c4 foo/bar, it is what gcc
actually emits for the latter.
x > 6 ? 0 : ((85 >> x) & 1)
isn't functionally equivalent to anything mentioned so far here, as it handles
negative values differently.

[Bug middle-end/113205] [14 Regression] internal compiler error: in backward_pass, at tree-vect-slp.cc:5346 since r14-3220

2024-02-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205

--- Comment #13 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:0394ae31e832c5303f3b4aad9c66710a30c097f0

commit r14-9165-g0394ae31e832c5303f3b4aad9c66710a30c097f0
Author: Richard Sandiford 
Date:   Sat Feb 24 11:58:22 2024 +

vect: Tighten check for impossible SLP layouts [PR113205]

During its forward pass, the SLP layout code tries to calculate
the cost of a layout change on an incoming edge.  This is taken
as the minimum of two costs: one in which the source partition
keeps its current layout (chosen earlier during the pass) and
one in which the source partition switches to the new layout.
The latter can sometimes be arranged by the backward pass.

If only one of the costs is valid, the other cost was ignored.
But the PR shows that this is not safe.  If the source partition
has layout 0 (the normal layout), we have to be prepared to handle
the case in which that ends up being the only valid layout.

Other code already accounts for this restriction, e.g. see
the code starting with:

/* Reject the layout if it would make layout 0 impossible
   for later partitions.  This amounts to testing that the
   target supports reversing the layout change on edges
   to later partitions.

gcc/
PR tree-optimization/113205
* tree-vect-slp.cc (vect_optimize_slp_pass::forward_cost): Reject
the proposed layout if it does not allow a source partition with
layout 2 to keep that layout.

gcc/testsuite/
PR tree-optimization/113205
* gcc.dg/torture/pr113205.c: New test.

[Bug middle-end/113205] [14 Regression] internal compiler error: in backward_pass, at tree-vect-slp.cc:5346 since r14-3220

2024-02-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205

Richard Sandiford  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Richard Sandiford  ---
Finally fixed.

[Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #7 from Jakub Jelinek  ---
Now, suppose we optimize the (0x >> x) & 1 case etc. provided suitable
range
of x to x & 1.
For
int
bar3 (int e)
{
  if (e <= 15U)
return e & 1;
  else
return 0;
}
phiopt optimizes this into
  return e & 1 & (e <= 15U);
so, guess we want another match.pd optimization which would turn that into e &
-15.

[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales

2024-02-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696

Richard Sandiford  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org

[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales

2024-02-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696

--- Comment #3 from Richard Sandiford  ---
Created attachment 57520
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57520&action=edit
Candidate patch

The attached patch seems to fix it.  I'm taking next week off, but I'll run the
patch through proper testing when I get back.

[Bug rtl-optimization/114087] New: RISC-V optimization on checking certain bits set ((x & mask) == val)

2024-02-24 Thread Explorer09 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087

Bug ID: 114087
   Summary: RISC-V optimization on checking certain bits set ((x &
mask) == val)
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

It might be common in the C family of languages to check if certain bits are
set in an integer with a code pattern like this:

```c
unsigned int x;
if ((x & 0x3000) == 0x1000) {
  // Do something...
}
```

And I am surprised when compilers like GCC and Clang didn't realize they can
use some bit shifts and inversions of bit masks to save some instructions and
emit smaller code.

Here I present 3 possible optimizations that could be implemented in a
compiler. Two of them can apply not only to RISC-V, but other RISC
architectures as well (except ARM, perhaps). The last one is specific to RISC-V
due to the 20-bit immediate operand of its "lui" (load upper immediate)
instruction.

The bit masks should be compile time constants, and the "-Os" flag (optimize
for size) is assumed.

### Test code

The example code and constants are crafted specifically for RISC-V.

Each group of `pred*` functions should function identically (if not, please let
me know; it might be a typo).

* The "a" variants are what I commonly write for checking the set bits.
* The "b" variants are what I believe the compiler should ideally transform the
code to. I wrote them to let compiler developers know how the optimization can
be done. (But in practice the "b" code might transform to "a", meaning the
"optimization" direction reversed.)
* The "c" variants are hacks to make things work. They contain `__asm__
volatile` directives to force GCC or Clang to optimize in the direction I want.
The generated assembly should present what I considered the ideal result.

```c
#include 
#include 
#define POWER_OF_TWO_FACTOR(x) ((x) & -(x))

// ---
// Example 1: The bitwise AND mask contains lower bits in all ones.
// By converting the bitwise AND into a bitwise OR, an "addi"
// instruction can be saved.
// (This might conflict with optimizations utilizing RISC-V "bclri"
// instruction; use one or the other.)
// (In ARM there are "bic" instructions already, making this
// optimization useless.)

static uint32_t mask1 = 0x5FFF;
static uint32_t val1  = 0x14501DEF;
// static_assert((mask1 & val1) == val1);
// static_assert((mask1 & 0xFFF) == 0xFFF);

bool pred1a(uint32_t x) {
  return ((x & mask1) == val1);
}

bool pred1b(uint32_t x) {
  return ((x | ~mask1) == (val1 | ~mask1));
}

bool pred1c(uint32_t x) {
  register uint32_t temp = x | ~mask1;
  __asm__ volatile ("" : "+r" (temp));
  return (temp == (val1 | ~mask1));
}

// ---
// Example 2: The bitwise AND mask could fit an 11-bit immediate
// operand of RISC-V "andi" instruction with a help of right
// shifting. (Keep the sign bit of the immediate operand zero.)
// (This kind of optimization could also work with other RISC 
// architectures, except ARM.)

static uint32_t mask2 = 0x5550;
static uint32_t val2  = 0x1450;

// static_assert(mask2 != 0);
// static_assert((mask2 & val2) == val2);
// static_assert(mask2 / POWER_OF_TWO_FACTOR(mask2) <= 0x7FF);

bool pred2a(uint32_t x) {
  return ((x & mask2) == val2);
}

bool pred2b(uint32_t x) {
  uint32_t factor = POWER_OF_TWO_FACTOR(mask2);
  return ((x >> __builtin_ctz(factor)) & (mask2 / factor))
== (val2 / factor);
}

bool pred2c(uint32_t x) {
  uint32_t factor = POWER_OF_TWO_FACTOR(mask2);
  register uint32_t temp = x >> 20;
  __asm__ volatile ("" : "+r" (temp));
  return (temp & 0x555) == 0x145;
}

// ---
// Example 3: The bitwise AND mask could fit a 20-bit immediate
// operand of RISC-V "lui" instruction.
// Only RISC-V has this 20-bit immediate "U-type" format, AFAIK.

static uint32_t mask3 = 0x0005;
static uint32_t val3  = 0x00045014;

// static_assert(mask3 / POWER_OF_TWO_FACTOR(mask3) <= 0xF);

bool pred3a(uint32_t x) {
  return ((x & mask3) == val3);
}

bool pred3b(uint32_t x) {
  uint32_t factor = POWER_OF_TWO_FACTOR(mask3);
  return (((x / factor) << 12) & ((mask3 / factor) << 12))
== ((val3 / factor) << 12);
}

bool pred3c(uint32_t x) {
  uint32_t factor = POWER_OF_TWO_FACTOR(mask3);
  register uint32_t temp = x << 12;
  __asm__ volatile ("" : "+r" (temp));
  return (temp & ((mask3 / factor) << 12))
== ((val3 / factor) << 12);
}
```

I tested the code in the Compiler Explorer (godbolt.org).

### Generated assembly (for reference only)

```
pred1a:
li  a5,1431658496
addia5,a5,-1
and a0,a0,a5
li  a5,340795392

[Bug rtl-optimization/114062] "GNAT BUG DETECTED" 13.2.0 (hppa-linux-gnu) in remove, at alloc-pool.h:437

2024-02-24 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114062

John David Anglin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #4 from John David Anglin  ---
Not reproducible.

[Bug c/114088] New: Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

Bug ID: 114088
   Summary: Please provide __builtin_c16slen and __builtin_c32slen
to complement __builtin_wcslenw
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: thiago at kde dot org
  Target Milestone: ---

Actually, GCC doesn't have __builtin_wcslen, but Clang does. Providing these
extra two builtins would allow implementing __builtin_wcslen too. The names are
not part of the C standard, but follow the current naming construction rules
for it, similar to how "mbrtowc" and "wcslen" parallel.

My specific need is actually to implement char16_t string containers in C++.
I'm particularly interested in QString/QStringView, but this applies to
std::basic_string{_view} too.

For example:

std::string_view f1() { return "Hello"; }
std::wstring_view fw() { return L"Hello"; }
std::u16string_view f16() { return u"Hello"; }
std::u32string_view f32() { return U"Hello"; }

With GCC and libstdc++, the first function produces optimal code:
movl$5, %eax
leaq.LC0(%rip), %rdx
ret

For wchar_t case, GCC emits an out-of-line call to wcslen:
pushq   %rbx
leaq.LC2(%rip), %rbx
movq%rbx, %rdi
callwcslen@PLT
movq%rbx, %rdx
popq%rbx
ret

The next two, because of the absence of a C library function, emit a loop:
xorl%eax, %eax
leaq.LC1(%rip), %rcx
.L4:
incq%rax
cmpw$0, (%rcx,%rax,2)
jne .L4
movq%rcx, %rdx
ret

Clang, meanwhile, emits optimal code for all four and so did the pre-Clang
Intel compiler. See https://gcc.godbolt.org/z/qvj7qnYbz. MSVC emits optimal for
the char and wchar_t versions, but loops for the other two.

Clang gives up when the string gets longer, though. See
https://gcc.godbolt.org/z/54j3zr6e6. That indicates that it gave up on guessing
the loop run and would do better if the intrinsic were present.

[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug middle-end/114087] RISC-V optimization on checking certain bits set ((x & mask) == val)

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||missed-optimization
  Component|rtl-optimization|middle-end

[Bug testsuite/114089] New: FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)

2024-02-24 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114089

Bug ID: 114089
   Summary: FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess
errors)
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: danglin at gcc dot gnu.org
CC: rsandifo at gcc dot gnu.org
  Target Milestone: ---
  Host: hppa64-hp-hpux11.11
Target: hppa64-hp-hpux11.11
 Build: hppa64-hp-hpux11.11

This test fails on hppa64-hp-hpux11.11.  Test lacks "target aarch64*-*-*"
restriction.

[Bug testsuite/114089] FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114089

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jakub at gcc dot gnu.org
 Resolution|--- |FIXED
Version|13.2.1  |14.0

--- Comment #1 from Jakub Jelinek  ---
See r14-9165 ?

[Bug testsuite/114089] FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114089

--- Comment #2 from Jakub Jelinek  ---
I mean r14-9162 , sorry.

[Bug tree-optimization/114090] New: forwprop -fwrapv miscompilation

2024-02-24 Thread kristerw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Bug ID: 114090
   Summary: forwprop -fwrapv miscompilation
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The function f below returns an incorrect result for INT_MIN when compiled with
-O1 -fwrapv for X86_64:


__attribute__((noipa)) int f(int x) {
int w = (x >= 0 ? x : 0);
int y = -x;
int z = (y >= 0 ? y : 0);
return w + z;
}

int
main ()
{
  if (f(0x8000) != 0)
__builtin_abort ();
  return 0;
}


What is happening is that forwprop has optimized

  w_2 = MAX_EXPR ;
  y_3 = -x_1(D);
  z_4 = MAX_EXPR ;
  _5 = w_2 + z_4;
  return _5;

to

  _5 = ABS_EXPR ;
  return _5;

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=94920
   Keywords||wrong-code
   Target Milestone|--- |13.3
Summary|forwprop -fwrapv|[13/14 Regression] forwprop
   |miscompilation  |-fwrapv miscompilation

--- Comment #1 from Andrew Pinski  ---
The pattern:
/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
(simplify
  (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
  (abs @0))

introduced by r13-1785-g633e9920589ddf .

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-24
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
There most likely should be this check added:
ANY_INTEGRAL_TYPE_P (type) && !TYPE_OVERFLOW_WRAPS (type)

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

--- Comment #3 from Jakub Jelinek  ---
Both the patterns look wrong for TYPE_OVERFLOW_WRAPS and the first one also for
TYPE_UNSIGNED (the second one is ok for TYPE_UNSIGNED but doesn't make much
sense there, we should have folded it to 0.  Of course, the first one is
unlikely to trigger for TYPE_UNSIGNED because MAX  should have
been folded to 0.

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
I'd go with
--- gcc/match.pd.jj 2024-02-22 10:09:48.678446435 +0100
+++ gcc/match.pd2024-02-24 19:23:32.201014245 +0100
@@ -453,8 +453,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

 /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
 (simplify
-  (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
-  (abs @0))
+ (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
+ (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
+  (abs @0)))

 /* X * 1, X / 1 -> X.  */
 (for op (mult trunc_div ceil_div floor_div round_div exact_div)
@@ -4218,8 +4219,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

 /* (x <= 0 ? -x : 0) -> max(-x, 0).  */
 (simplify
-  (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
-  (max @2 @1))
+ (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
+ (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
+  (max @2 @1)))

 /* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
 (for op (bit_xor bit_ior plus)

[Bug fortran/66499] Letters with accents change format behavior for X and T descriptors.

2024-02-24 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66499

--- Comment #7 from Jerry DeLisle  ---
There two issues going on here. We do not interpret source code that is UTF-8
encoded.  This is why in our current tests for UTF-8 encoding of data files we
us hexidecimal codes.

I will have to see what the standard says about non=ASCII character sets in
source code.

If I get around this by using something like this:

char1 = 4_"Test without local char"
char2 = 4_"Test with local char "

char2(22:22) = 4_"Ã"
char2(23:23) = 4_"Ã"

$ ./a.out 
  23
  23
1234567890123456789012345678901234567890
  Test without local char  10.
  Test with local char ÃÃ10.

The string lengths now match correctly.  One can see the tabbing is still off. 
This is because the format buffer seek functions are byte oriented and when
using UTF-8 encoding we need to seek the buffer differently. In fact we have to
allocate it differently as well to maintain the four byte characters.

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #5 from Jakub Jelinek  ---
&& !TYPE_OVERFLOW_SANITIZED (type) is IMHO not needed, because both
transformations for
INT_MIN trigger UB before and after.

[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

--- Comment #1 from Jonathan Wakely  ---
GCC built-ins like __builtin_strlen just wrap a libc function. __builtin_wcslen
would generally just be a call to wcslen, which doesn't give you much. I assume
what you want is to recognize wcslen and replace it with inline assembly code.

Similarly, if libc doesn't provide c16slen then a __builtin_c16slen isn't going
to do much.

I think what you want is better code for finding char16_t(0) or char32_t(0),
not a new built-in.

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #6 from Jakub Jelinek  ---
Created attachment 57521
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57521&action=edit
gcc14-pr114090.patch

Full untested patch.

[Bug c++/114091] New: gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context

2024-02-24 Thread markmigm at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114091

Bug ID: 114091
   Summary: gcc/config/aarch64/aarch64.cc has code requiring c++14
instead of c++11, so g++14 bootsrap fails in my
example context
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: markmigm at gmail dot com
  Target Milestone: ---

[I'm not sure where gcc/config/aarch64/aarch64.cc fits in the
component alternatives. Feel free to correct that if I got it
wrong.] 

gcc bootstrap is based on c++11, which predates the constructors for pair
being constexpr. The gcc/config/aarch64/aarch64.cc specific code can fail
because of using pair constructors where constant expressions are required:

/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/config/aarch64/aarch64.cc:13095:50:
error: constexpr variable 'tiles' must be initialized by a constant expression
static constexpr std::pair tiles[] = {
^ ~
/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/config/aarch64/aarch64.cc:13096:5:
note: non-constexpr constructor 'pair' cannot be used in a
constant expression
{ 0xff, 'b' },
^

This stops the bootstrap in the example context.

This is detected when clang is doing the bootstrapping on FreeBSD.
For reference:

c++ -std=c++11  -fPIC -c   -g -DIN_GCC   -fno-strict-aliasing -fno-exceptions
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings
-Wcast-qual -Wno-format -Wmissing-format-attribute -Wconditionally-supported
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -fPIC -I. -I.
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/.
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../include 
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libcpp/include
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libcody
-I/usr/local/include 
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libdecnumber
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libdecnumber/bid
-I../libdecnumber
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libbacktrace 
-DLIBICONV_PLUG -o aarch64.o -MT aarch64.o -MMD -MP -MF ./.deps/aarch64.TPo
/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/config/aarch64/aarch64.cc

where clang used its libc++ (its a FreeBSD context):

/usr/include/c++/v1/__utility/pair.h:225:5: note: declared here
  225 | pair(_U1&& __u1, _U2&& __u2)
  | ^

having -std=c++11 on the command line. That results in lack of
constexpr status in libc++ .

It would appear that until gcc bootstrap intends on being based on
c++14 (or later) that the gcc/config/aarch64/aarch64.cc code
reported is presuming a post-c++11 context when it should not be.

[Bug target/113763] [14 Regression] build fails with clang++ host compiler because aarch64.cc uses C++14 constexpr.

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113763

Andrew Pinski  changed:

   What|Removed |Added

 CC||markmigm at gmail dot com

--- Comment #19 from Andrew Pinski  ---
*** Bug 114091 has been marked as a duplicate of this bug. ***

[Bug target/114091] gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114091

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
This has already been fixed, over 2 weeks ago.

>20240114

You are using a GCC 14 snapshot from a month ago even. Please try a newer
snapshot before reporting a bug next time.

*** This bug has been marked as a duplicate of bug 113763 ***

[Bug target/114091] gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context

2024-02-24 Thread markmigm at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114091

--- Comment #2 from Mark Millard  ---
(In reply to Andrew Pinski from comment #1)
> This has already been fixed, over 2 weeks ago.
> 
> >20240114
> 
> You are using a GCC 14 snapshot from a month ago even. Please try a newer
> snapshot before reporting a bug next time.
> 
> *** This bug has been marked as a duplicate of bug 113763 ***

Sorry. I was building a FreeBSD port and I'm not a port maintainer, much
less one for FreeBSD's lang/gcc14-devel .

I've sent the port maintainer a copy of your reply. Thanks.

[Bug tree-optimization/114092] New: ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1`

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114092

Bug ID: 114092
   Summary: ADD_OVERFLOW with resulting type of `_Complex
unsigned:1` should be reduced to just `(unsigned)(a)
<= 1`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
struct d
{
  unsigned i:1;
};
_Bool f(int a, struct d b)
{
return __builtin_add_overflow_p(a, 0, b.i);
}


_Bool f1(int a, struct d b)
{
return a != 1 && a != 0;
}
```

These 2 functions should produce the same. Here `a+0` overflows an `unsigned:1`
if the value of a is not 0 or 1.

We could extend this to any smaller types too if we want.

[Bug tree-optimization/114092] ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1`

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114092

--- Comment #1 from Andrew Pinski  ---
I should note that LLVM (LLVM does not have __builtin_add_overflow_p) is able
to optimize:
```
_Bool f2(int a, struct d b, unsigned _BitInt(1) t)
{
return __builtin_add_overflow(a, 0, &t);
}
```
into  f1.

[Bug tree-optimization/114092] ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1`

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114092

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Guess all of .ADD_OVERFLOW (x, 0), .ADD_OVERFLOW (0, x) and .SUB_OVERFLOW (x,
0)
to REALPART_EXPR = (type) x and IMAGPART_EXPR to (type) x != x.  Just need to
figure out for which types it is beneficial and for which it isn't.

[Bug tree-optimization/114093] New: Canonicalization of `a == -1 || a == 0`

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114093

Bug ID: 114093
   Summary: Canonicalization of `a == -1 || a == 0`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
_Bool f1(int a)
{
return a == -1 || a == 0;
}

_Bool f0(signed a)
{
a = -a;
return a == 1 || a == 0;
}


_Bool f(unsigned a)
{
return a == -1u || a == 0;
}

_Bool f3(unsigned a)
{
a = -a;
return a == 1 || a == 0;
}


_Bool f2(unsigned a)
{
return (-a) <= 1;
}
```

These all should produce the exact same code as they are all equivalent (if we
ignore the (undefined) overflow possibility for f0).

This is more about canonicalizations rather than anything else.

Though I will note on the riscv and mips targets, f is worse than the others.

LLVM canonical form seems to be `((unsigned)a) + 1 <= 1`.

[Bug target/100799] Stackoverflow in optimized code on PPC

2024-02-24 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

--- Comment #27 from Peter Bergner  ---
(In reply to Jakub Jelinek from comment #26)
> But I still think the workaround is possible on the callee side.
> Sure, if the DECL_HIDDEN_STRING_LENGTH argument(s) is(are) used in the
> function, then there is no easy way but expect the parameter save area (ok,
> sure, it could just load from the assumed parameter location and don't
> assume the rest is there, nor allow storing to the slots it loaded them
> from).
> But that is actually not what BLAS etc. suffers from.
[snip]
> So, the workaround could be for the case of unused DECL_HIDDEN_STRING_LENGTH
> arguments at the end of PARM_DECLs don't try to load those at all and don't
> assume there is parameter save area unless the non-DECL_HIDDEN_STRING_LENGTH
> or used DECL_HIDDEN_STRING_LENGTH arguments actually require it.
So I looked closer at what the failure mode was in this PR (versus the one
you're seeing with flexiblas).  As in your case, there is a mismatch in the
number of parameters the C caller thinks there are (8 args, so no param save
area needed) versus what the Fortran callee thinks there are (9 params which
include the one hidden arg, so there is a param save area).  The Fortran
function doesn't actually access the hidden argument in our test case above, in
fact the character argument is never used either.  What I see in the rtl dumps
is that *all* incoming args have a REG_EQUIV generated that points to the param
save area (this doesn't happen when there are 8 or fewer formal params), even
for the first 8 args that are passed in registers:

(insn 2 12 3 2 (set (reg/v/f:DI 117 [ r3 ])
(reg:DI 3 3 [ r3 ])) "callee-3.c":6:1 685 {*movdi_internal64}
 (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap)
(const_int 32 [0x20])) [1 r3+0 S8 A64])
(nil)))
(insn 3 2 4 2 (set (reg/v:DI 118 [ r4 ])
(reg:DI 4 4 [ r4 ])) "callee-3.c":6:1 685 {*movdi_internal64}
 (expr_list:REG_EQUIV (mem/c:DI (plus:DI (reg/f:DI 99 ap)
(const_int 40 [0x28])) [2 r4+0 S8 A64])
(nil)))
...

We then get to RA and we end up spilling one of the pseudos associated with one
of the other parameters (not the character param JOB).  LRA then uses that
REG_EQUIV note and rather than allocating a new stack slot to spill to, it uses
the parameter save memory location for that parameter for the spill slot.  When
we store to that memory location and the C caller has not allocated the param
save area, we end up clobbering an important part of the C callers stack
causing a crash.

If we were to try and do a callee workaround, we would need to disable setting
those REG_EQUIV notes for the parameters... if that's even possible.  Since
Fortran uses call-by-name parameter passing, isn't the updated param value from
the callee returned in the parameter save area itself???


> Doing the workaround on the caller side is impossible, this is for calls
> from C/C++ to Fortran code, directly or indirectly called and there is
> nothing the compiler could use to guess that it actually calls Fortran code
> with hidden Fortran character arguments.
As a HUGE hammer, every caller could always allocate a param save area.  That
would "fix" the problem from this bug, but would that also fix the bug you're
seeing in flexiblas?

I'm not advocating this though.  I was thinking maybe making callers (under an
option?) conservatively assume the callee is a Fortran function and for those C
arguments that could map to a Fortran parameter with a hidden argument, bump
the number of counted args by 1.  For example, a C function with 2 char/char *
args and 6 int args would think there are 8 normal args and 2 hidden args, so
it needs to allocate a param save area.  Is that not feasible?  ...or does that
not even address the issue you're seeing in your bug?

[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #2 from Xi Ruoyao  ---
(In reply to Jonathan Wakely from comment #1)
> GCC built-ins like __builtin_strlen just wrap a libc function. 
> __builtin_wcslen would generally just be a call to wcslen, which doesn't give 
> you much.

But __builtin_strlen *does* get optimized when the input is a string literal. 
Not sure about wcslen though.

[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

--- Comment #3 from Thiago Macieira  ---
> But __builtin_strlen *does* get optimized when the input is a string literal. 
>  Not sure about wcslen though.

It appears not to, in the test above. std::char_trait::length() calls
wcslen() whereas the char specialisation uses __builtin_strlen() explicitly.
But if the intrinsics are enabled, the two would be the same, wouldn't they?

Anyway, in the absence of a library function to call, inserting the loop is
fine; it's what is there already.

Though it would be nice to be able to provide such a function. I wrote it for
Qt (it's called qustrlen). I would try with __builtin_constant_p first to see
if the string is a literal.