[Bug rtl-optimization/67635] [SH] ifcvt missed optimization

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67635

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-08
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #4 from Andrew Pinski  ---
Confirmed.

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

David Binderman  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #7 from David Binderman  ---
(In reply to Andrew Pinski from comment #4)
> Confirmed.
> 
> I am 99% sure it was r14-2946-g46c8c225455273

Perhaps it would be worthwhile to ask Richard to have a look.

[Bug tree-optimization/96433] Failed to optimize (A / N) * N <= A

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96433

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #5 from Andrew Pinski  ---
Maybe we can just pattern match this simple cases which seems like what clang
does:
(simplify
 (le:c @0 (mult:c (div @0 @1) @1)
 ( { true_value; } ))
(simplify
 (gt:c @0 (mult:c (div @0 @1) @1)
 ( { false_value; } ))

That is these are all optimized:
//const __SIZE_TYPE__ N = 3;

int foo(__SIZE_TYPE__ len, __SIZE_TYPE__ N) {
__SIZE_TYPE__ newlen = (len / N) * N;
return newlen <= len;
}
int foo0(__SIZE_TYPE__ len, __SIZE_TYPE__ N) {
__SIZE_TYPE__ newlen = (len / N) * N;
return newlen > len;
}
int foo1(__SIZE_TYPE__ len, __SIZE_TYPE__ N) {
__SIZE_TYPE__ newlen = (len / N) * N;
return !(len >= newlen);
}

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

Richard Biener  changed:

   What|Removed |Added

 CC||stefansf at linux dot ibm.com
 Ever confirmed|0   |1
   Last reconfirmed||2023-08-08
 Status|UNCONFIRMED |WAITING

--- Comment #1 from Richard Biener  ---
I think this was reported before (and fixed by r14-2932-g41ef5a34161356).  Can
you try again with updated GCC?

[Bug c++/110912] False assumption that constructors cannot alias any of their parameters

2023-08-08 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110912

--- Comment #2 from Jan Schultke  ---
(In reply to Jiang An from comment #1)
> The restriction agains aliasing was intended, see
> https://cplusplus.github.io/CWG/issues/2271.html.
> 
> The status quo seems to be that in the body of `A::A(int &x)`, compilers can
> assume that the value of `x` won't be changed by a modification on `*this`,
> but not the other way around.

Then this status quo is not correctly implemented, because in the example, GCC
assumes that a change of `x` (see `x = 5`) cannot alter `this->i` (see `i == 0`
assumed to be always true).

It is not enough to put `__restrict` on the parameters; a much weaker modifier
must be used for this purpose. At most, a "one-way `__restrict`" must be used,
if such a thing exists.

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #8 from Richard Biener  ---
I will have a look.

[Bug target/110899] RFE: Attributes preserve_most and preserve_all

2023-08-08 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899

--- Comment #10 from Florian Weimer  ---
(In reply to Michael Matz from comment #9)
> > > I don't see how that helps.  Imagine a preserve_all function foo that 
> > > calls
> > > printf.  How do you propose that 'foo' saves all parts of the SSE 
> > > registers,
> > > even those that aren't invented yet, or those that can't be touched by the
> > > current ISA?  (printf might clobber all of these)
> > 
> > Vector registers are out of scope for this.
> 
> Why do you say that?  From clang: "Furthermore it also preserves all
> floating-point registers (XMMs/YMMs)."  (for preserve_all, but this
> bugreport does include that variant of the attribute).

Ugh, I preferred not to look at it because it's likely that the Clang
implementation is broken (not future-proof).

> > But lets look at APX. If printf is recompiled to use APX, then it will
> > clobber the extended register file. If we define __preserve_most__ the way
> > we do in my psABI proposal (i.e., *not* as everything but %r11), the
> > extended APX registers are still caller-saved.
> 
> Right, for preserve_most _with your wording_ it works out.  preserve_all
> or preserve_most with clang wording doesn't.

In glibc, we already use a full context switch with XSAVE for the dynamic
loader trampoline. As far as I understand it, it's not future-proof. The kernel
could provide an interface that is guaranteed to work because it only enables
those parts of the register file that it can context-switch. I can probably get
the userspace-only implementation into glibc, but the kernel interface seems
unlikely. We'd also have to work out the interaction of preserve_all and
unwinding, setjmp etc.; not sure if there is a proper solution for that.

[Bug tree-optimization/110941] New: [14 Regression] Dead Code Elimination Regression at -O3 since r14-2379-gc496d15954c

2023-08-08 Thread scherrer.sv at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110941

Bug ID: 110941
   Summary: [14 Regression] Dead Code Elimination Regression at
-O3 since r14-2379-gc496d15954c
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scherrer.sv at gmail dot com
  Target Milestone: ---

static int a;
void foo(void);
void bar349_(void);
void bar363_(void);
void bar275_(void);
int main() {
  {
{
  short b = 26;
  for (; b >= 1; b = b - 4) {
if (b >= 2 && b <= 26)
  bar275_();
if (a)
  bar363_();
if (a)
  bar349_();
int c = b;
if (!(c >= 2 && c <= 26))
  foo();
  }
}
a = 0;
  }
}

gcc-25c4b1620eb (trunk) -O3 cannot eliminate the call to foo but
gcc-releases/gcc-13.1.0 -O3 can.
---
gcc-25c4b1620ebc10fceabd86a34fdbbaf8037e7e82 -O3 case.c -S -o case.s
- OUTPUT -
main:
.LFB0:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movl$24, %ebx
.p2align 4,,10
.p2align 3
.L8:
cmpw$24, %bx
jbe .L2
movla(%rip), %edx
testl   %edx, %edx
jne .L3
.L7:
callfoo
xorl%eax, %eax
popq%rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
movl$0, a(%rip)
ret
.p2align 4,,10
.p2align 3
.L3:
.cfi_restore_state
callbar363_
movla(%rip), %eax
testl   %eax, %eax
je  .L7
callbar349_
jmp .L7
.L2:
callbar275_
cmpl$0, a(%rip)
jne .L15
.p2align 4,,10
.p2align 3
.L6:
subl$4, %ebx
cmpl$-4, %ebx
jne .L8
movl$0, a(%rip)
xorl%eax, %eax
popq%rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L15:
.cfi_restore_state
callbar363_
cmpl$0, a(%rip)
je  .L6
callbar349_
cmpl$24, %ebx
jbe .L6
jmp .L7
-- END OUTPUT -

---
gcc-2b98cc24d6af0432a74f6dad1c722ce21c1f7458 -O3 case.c -S -o case.s
- OUTPUT -
main:
.LFB0:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movl$24, %ebx
.L10:
cmpw$24, %bx
jbe .L2
movla(%rip), %edx
testl   %edx, %edx
jne .L17
.L3:
movl$0, a(%rip)
xorl%eax, %eax
popq%rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L17:
.cfi_restore_state
callbar363_
movla(%rip), %eax
testl   %eax, %eax
jne .L4
.L15:
subl$4, %ebx
jmp .L10
.p2align 4,,10
.p2align 3
.L4:
callbar349_
jmp .L15
.L2:
callbar275_
cmpl$0, a(%rip)
jne .L18
.p2align 4,,10
.p2align 3
.L9:
subl$4, %ebx
cmpw$-4, %bx
jne .L10
jmp .L3
.L18:
callbar363_
cmpl$0, a(%rip)
je  .L9
callbar349_
jmp .L9
-- END OUTPUT -

---
Bisects to r14-2379-gc496d15954c

[Bug tree-optimization/110942] New: [14 Regression] Dead Code Elimination Regression at -O3 since r14-1165-g257c2be7ff8

2023-08-08 Thread scherrer.sv at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110942

Bug ID: 110942
   Summary: [14 Regression] Dead Code Elimination Regression at
-O3 since r14-1165-g257c2be7ff8
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: scherrer.sv at gmail dot com
  Target Milestone: ---

static int b, c;
static short d;
static char e;
void bar49_(void);
void bar115_(void);
void bar42_(void);
void foo(void);
static short(a)(short f, short g) { return f + g; }
static int h() {
  short i = 0;
  char j;
  int k;
l:
  k = 0;
  if (i)
bar42_();
  i = 0;
  for (; i != 8; i = a(i, 8)) {
if (e)
  bar49_();
if (0 >= i)
  c = 0;
if (!k) {
  j = 6;
  for (; j >= 0; j--) {
k = 0;
for (; k <= 0; k++) {
  if (!(j >= 5))
bar115_();
  for (; d; d = 0) {
if (!j)
  foo();
if (j)
  break;
  }
  if (i)
break;
}
if (c)
  return j;
c = 2;
if (b)
  goto l;
  }
}
  }
  return 0;
}
int main() { h(); }

gcc-25c4b1620eb (trunk) -O3 cannot eliminate the call to foo but
gcc-releases/gcc-13.1.0 -O3 can.
---
gcc-25c4b1620ebc10fceabd86a34fdbbaf8037e7e82 -O3 case.c -S -o case.s
- OUTPUT -
main:
.LFB2:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
xorl%eax, %eax
movl$6, %ebx
.L2:
movl%eax, c(%rip)
cmpb$4, %bl
jne .L18
.L7:
callbar115_
cmpw$0, d(%rip)
je  .L8
testb   %bl, %bl
jne .L8
callfoo
movlc(%rip), %edx
xorl%eax, %eax
movw%ax, d(%rip)
testl   %edx, %edx
jne .L5
movl$2, c(%rip)
.L5:
xorl%eax, %eax
popq%rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L18:
.cfi_restore_state
testl   %eax, %eax
jne .L5
subl$1, %ebx
movl$2, %eax
jmp .L2
.L8:
cmpl$0, c(%rip)
jne .L5
movl$2, c(%rip)
subl$1, %ebx
cmpb$-1, %bl
jne .L7
jmp .L5
-- END OUTPUT -

---
gcc-2b98cc24d6af0432a74f6dad1c722ce21c1f7458 -O3 case.c -S -o case.s
- OUTPUT -
main:
.LFB2:
.cfi_startproc
movl$2, c(%rip)
xorl%eax, %eax
ret
-- END OUTPUT -

---
Bisects to r14-1165-g257c2be7ff8

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #9 from Richard Biener  ---
Hm, OK.  So I actually thought of this issue.  It's basically a hole in the
requirement of all function "exits" having a virtual use and thus virtual
PHIs being created along the paths to it.  I was aware of GIMPLE_RESX and
thought about const noreturn (but that attribute combination is rejected).

IPA pure-const detects func_15 as looping 'const' and noreturn:

Function is locally const.
Function found to be noreturn: func_15
Function found to be looping const: func_15/2
Declaration updated to be looping const: func_15/2

so there's two ways to "fix" this, one - drop the assumption we have
PHIs on the paths to function exit (including in not returning regions).
That makes

virtual_operand_live::get_live_in (basic_block bb)
{
...
  /* Since we don't have a virtual PHI we can now pick any of the
 incoming edges liveout value.  All returns from the function have
 a virtual use forcing generation of virtual PHIs.  */
  edge_iterator ei;
  edge e;
  FOR_EACH_EDGE (e, ei, bb->preds)
if (liveout[e->src->index])
  {
if (EDGE_PRED (bb, 0) != e)
  liveout[EDGE_PRED (bb, 0)->src->index] = liveout[e->src->index];
return liveout[e->src->index];
  }

instead required to check each edge and we can't simply take the value
from the immediate dominator as fallback.  When we discover divergence
we need to return NULL (unknown - nobody created the actually live VOP).

The other alternative is to make sure we _do_ have the virtual operand.
Either by making sure the "invalid" combination of const + noreturn
isn't detected or by creating a virtual use in that case anyway
(and fixup code because of that inconsistency).

[Bug target/110943] New: RISC-V: vmv.v.x and vmv.s.x pattern combine error

2023-08-08 Thread lehua.ding at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110943

Bug ID: 110943
   Summary: RISC-V: vmv.v.x and vmv.s.x pattern combine error
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lehua.ding at rivai dot ai
  Target Milestone: ---

Consider this code (https://godbolt.org/z/TKoqjxTPh):

```c
#include 

void foo9 (void *base, void *out, size_t vl)
{
int64_t scalar = *(int64_t*)(base + 100);
vint64m2_t v = __riscv_vmv_v_x_i64m2 (0, 1);
*(vint64m2_t*)out = v;
}
```

the asm of GCC:

```asm
foo9:
vsetvli a5,zero,e64,m2,ta,ma
vmv.v.i v2,0
vsetivlizero,1,e64,m2,ta,ma
vse64.v v2,0(a1)
ret

```

the asm of LLVM:

```asm
foo9:   # @foo9
vsetivlizero, 1, e64, m2, ta, ma
vmv.v.i v8, 0
vs2r.v  v8, (a1)
ret
```

I think the GCC changes the semantics of full store `*(vint64m2_t*)out = v;`.If
there is a memory exception when storing the second element of v, then LLVM's
code will cause but GCC's code doesn't.

Confirmed on GCC 13.2.0 and Trunk.

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

--- Comment #10 from Richard Biener  ---
Created attachment 55705
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55705&action=edit
prototype

This implements option two - Honza, I think the pure-const hunk is sound but I
didn't find a proper "lattice" for modref so had to modify the pure/const
setting places (three, ugh) - is there a better way to tell modref the
"base" lattice value is pure and not const?

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread panchenghui at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #2 from Chenghui Pan  ---
(In reply to Richard Biener from comment #1)
> I think this was reported before (and fixed by r14-2932-g41ef5a34161356). 
> Can you try again with updated GCC?

Ok, I clone a new copy of codes and is bootstrapping on loongarch64 now.

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

Xi Ruoyao  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|WAITING |RESOLVED

--- Comment #3 from Xi Ruoyao  ---


*** This bug has been marked as a duplicate of bug 110867 ***

[Bug rtl-optimization/110867] [14 Regression] ICE in combine after 7cdd0860949c6c3232e6cff1d7ca37bb5234074c

2023-08-08 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110867

Xi Ruoyao  changed:

   What|Removed |Added

 CC||panchenghui at loongson dot cn

--- Comment #6 from Xi Ruoyao  ---
*** Bug 110939 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/110867] [14 Regression] ICE in combine after 7cdd0860949c6c3232e6cff1d7ca37bb5234074c

2023-08-08 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110867

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #7 from Xi Ruoyao  ---
Can we close it as fixed now?

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

--- Comment #11 from Richard Biener  ---
Variant (1) is just the following, but I'm unsure as to the possible effects of
that ... (test coverage will be very low anyway)

diff --git a/gcc/tree-ssa-operands.cc b/gcc/tree-ssa-operands.cc
index 57e393ae164..c8ac98b4e06 100644
--- a/gcc/tree-ssa-operands.cc
+++ b/gcc/tree-ssa-operands.cc
@@ -696,7 +696,8 @@ operands_scanner::maybe_add_call_vops (gcall *stmt)
   /* A 'pure' or a 'const' function never call-clobbers anything.  */
   if (!(call_flags & (ECF_PURE | ECF_CONST)))
add_virtual_operand (opf_def);
-  else if (!(call_flags & ECF_CONST))
+  else if (!(call_flags & ECF_CONST)
+  || (call_flags & ECF_NORETURN))
add_virtual_operand (opf_use);
 }
 }

[Bug tree-optimization/110942] [14 Regression] Dead Code Elimination Regression at -O3 since r14-1165-g257c2be7ff8

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110942

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug tree-optimization/110941] [14 Regression] Dead Code Elimination Regression at -O3 since r14-2379-gc496d15954c

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110941

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug target/98784] [11/12/13/14 Regression] problematic build of uClibc with -fPIC

2023-08-08 Thread wbx at openadk dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98784

Waldemar Brodkorb  changed:

   What|Removed |Added

 CC||wbx at openadk dot org

--- Comment #18 from Waldemar Brodkorb  ---
Hi,

still happens with gcc 13.2.0.
You can boot a shell and then in strace you see a segfault error:

[pid28] fstat64(3, {st_mode=S_IFDIR|S_ISVTX|0777, st_size=400, ...}) = 0
[pid28] brk(0x154000)   = 0x154000
[pid28] getdents64(3, 0xefb11b80 /* 20 entries */, 4096) = 496
[pid28] brk(0x155000)   = 0x155000
[pid28] lstat64("./init", {st_mode=S_IFLNK|0777, st_size=10, ...}) = 0
[pid28] lstat64("./var", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0
[pid28] lstat64("./usr", {st_mode=S_IFDIR|0755, st_size=120, ...}) = 0
[pid28] lstat64("./tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, ...}) =
0
[pid28] lstat64("./sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
[pid28] lstat64("./sbin", {st_mode=S_IFDIR|0755, st_size=1420, ...}) = 0
[pid28] lstat64("./run", {st_mode=S_IFDIR|0777, st_size=40, ...}) = 0
[pid28] lstat64("./root", {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
[pid28] lstat64("./proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
[pid28] lstat64("./mnt", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0
[pid28] lstat64("./media", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, ...})
= 0
[pid28] lstat64("./linuxrc", {st_mode=S_IFLNK|0777, st_size=11, ...}) = 0
[pid28] lstat64("./lib", {st_mode=S_IFDIR|0755, st_size=260, ...}) = 0
[pid28] lstat64("./etc", {st_mode=S_IFDIR|0755, st_size=640, ...}) = 0
[pid28] lstat64("./dev", {st_mode=S_IFDIR|0755, st_size=640, ...}) = 0
[pid28] lstat64("./boot", {st_mode=S_IFDIR|0755, st_size=40, ...}) = 0
[pid28] lstat64("./bin", {st_mode=S_IFDIR|0755, st_size=1920, ...}) = 0
[pid28] getdents64(3, 0xefb11b80 /* 0 entries */, 4096) = 0
[pid28] close(3)= 0
[pid28] write(1, "\33[1;34mbin\33[m  \33[1;34metc\33[m"..., 109bin 
etc  linuxrc  proc sbin usr
) = 109
[pid28] write(1, "\33[1;34mboot\33[m \33[1;36minit\33["..., 109boot
init mediaroot sys  var
) = 109
[pid28] write(1, "\33[1;34mdev\33[m  \33[1;34mlib\33[m"..., 90dev 
lib  mnt  run  tmp
) = 90
[pid28] exit_group(0)   = ?
[pid28] +++ exited with 0 +++
<... rt_sigsuspend resumed>)= ? ERESTARTNOHAND (To be restarted if
no handler)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=28, si_uid=0,
si_status=0, si_utime=0, si_stime=3 /* 0.03 s */} ---
getrusage(RUSAGE_CHILDREN, {ru_utime={tv_sec=0, tv_usec=0}, ru_stime={tv_sec=0,
tv_usec=0}, ...}) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED,
NULL) = 28
getrusage(RUSAGE_CHILDREN, {ru_utime={tv_sec=0, tv_usec=8000},
ru_stime={tv_sec=0, tv_usec=32000}, ...}) = 0
wait4(-1, 0xefbe678c, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child
processes)
sigreturn({mask=[INT RT_1 RT_8 RT_15 RT_21 RT_23 RT_31]}) = -1 (errno 629)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
+++ killed by SIGSEGV +++

Any tipps how to debug this?

best regards
 Waldemar

[Bug libstdc++/108053] std::visit_format_arg should hide __int128 and other extensions behind a handle

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108053

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-08-08

--- Comment #1 from Jonathan Wakely  ---
This applies to all the extended floating-point types too.

[Bug c/96788] "integer constant is so large that it is unsigned" warning is incorrect

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96788

Jonathan Wakely  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=84764

--- Comment #7 from Jonathan Wakely  ---
Is this a dup of PR 84764?

[Bug c++/100805] __int128 should be disabled for non-extended -std= options

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100805

Jonathan Wakely  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=84764

--- Comment #4 from Jonathan Wakely  ---
For C2x and C++23 __int128 can be an extended integer type, so we don't need to
pretend it doesn't exist.

[Bug libstdc++/110944] New: std::variant & optional GDB representation is too verbose

2023-08-08 Thread sebastian.redl at getdesigned dot at via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110944

Bug ID: 110944
   Summary: std::variant & optional GDB representation is too
verbose
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sebastian.redl at getdesigned dot at
  Target Milestone: ---

The key lines of the GDB visualizers for std::variant and std::optional look
like this:

return "%s [no contained value]" % self.typename

Note that this contains the full typename of the object.

We use lots of optionals of Boost.Units types, which have *huge* types. Here's
a small snippet of output when I attempt to print a value:

landing_mass_ =
std::optional >, boost::units::dimensionless_type>,
boost::units::homogeneous_system > >,
boost::units::list > > > > > > > > >, void>, double>> =
{[contained value] = 45359.2371}

Note that this is just one field of a struct with many fields (quite a few of
them optionals of unit types), and the value I'm printing is a vector of
multiple such structs, and the type is repeated for every single instance of an
optional.

It is, in short, not useful; in fact it makes the debug output completely
unusable.

Please just remove it. If I need to know the type of something, I can easily
look it up in the source code if I'm unsure about them. The debugger needs to
focus on runtime values, which I cannot look up.

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

--- Comment #12 from Richard Biener  ---
(In reply to Richard Biener from comment #11)
> Variant (1) is just the following, but I'm unsure as to the possible effects
> of that ... (test coverage will be very low anyway)
> 
> diff --git a/gcc/tree-ssa-operands.cc b//gcc/tree-ssa-operands.cc
> index 57e393ae164..c8ac98b4e06 100644
> --- a/gcc/tree-ssa-operands.cc
> +++ b/gcc/tree-ssa-operands.cc
> @@ -696,7 +696,8 @@ operands_scanner::maybe_add_call_vops (gcall *stmt)
>/* A 'pure' or a 'const' function never call-clobbers anything.  */
>if (!(call_flags & (ECF_PURE | ECF_CONST)))
> add_virtual_operand (opf_def);
> -  else if (!(call_flags & ECF_CONST))
> +  else if (!(call_flags & ECF_CONST)
> +  || (call_flags & ECF_NORETURN))
> add_virtual_operand (opf_use);
>  }
>  }

That ICEs for example tree-ssa/cunroll-4.c because that inserts
__builtin_unreachable () calls and this function is explicitly
marked const noreturn (if you use that combo manually you get a diagnostic).
The ICE is because cunroll doesn't add virtual operands or updates SSA.

But it also shows that sinking into __builtin_unreachable () ending regions
will have the same issue and the ipa pure-const/modref fix won't fix that.

I'm going to try fixing the live problem :/

[Bug c++/110938] [11/12/13/14 Regression] miscompile if implicit special member is deleted and mutable

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110938

--- Comment #3 from Jonathan Wakely  ---
The 4.9.0 regression to return Y in a register seems to have happened with:

commit 3b6d16993b9d6812f6212bce4f35547fd9e40457 [r0-126146-g3b6d16993b9d68]
Author: Vladimir Makarov
Date:   Wed Oct 30 14:27:25 2013

regmove.c: Remove.

But I don't think that will have affected the __is_trivially_copyable(Y)
result.

[Bug libstdc++/110944] std::variant & optional GDB representation is too verbose

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110944

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-08
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread panchenghui at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #4 from Chenghui Pan  ---
(In reply to Richard Biener from comment #1)
> I think this was reported before (and fixed by r14-2932-g41ef5a34161356). 
> Can you try again with updated GCC?

I still get the exact same ICE message with updated GCC. (The commit I use for
now is 25c4b1620ebc10fceabd86a34fdbbaf8037e7e82, with same configure options.)

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

Xi Ruoyao  changed:

   What|Removed |Added

 Resolution|DUPLICATE   |---
 Ever confirmed|1   |0
 Status|RESOLVED|UNCONFIRMED

--- Comment #5 from Xi Ruoyao  ---
Reopen then.

[Bug libstdc++/110945] New: std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

Bug ID: 110945
   Summary: std::basic_string::assign dramatically slower than
other means of copying memory
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: janschultke at googlemail dot com
  Target Milestone: ---

See https://quick-bench.com/q/bqGjfyd180oOlJhiY_XnURMNKG8

Using the copy constructor performs best, and ends up using std::memcpy
internally. Even using .resize() and std::copy is much faster than .assign(),
because it is subject to more partial loop unrolling.

basic_string::assign:
https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.h#L1713C28-L1713C28

this calls the four-iterator form of .replace():
https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.h#L2378

this calls this form of _M_replace_dispatch(): (I think)
https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.tcc#L430

this calls _M_replace():
https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.tcc#L507

in this case, it should call _S_move():
https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.h#L431

this calls char_traits::move():
https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/char_traits.h#L223

and that calls __builtin_memcpy()

However, I must have followed this chain of calls incorrectly, because I do not
see a call to memmove in the output assembly, and most of the time is spent
here:

>nopl   (%rax)
>movdqa 0x42d8a0(%rdx),%xmm0
> 63.27% movups %xmm0,(%rax,%rdx,1)
> 36.69% add$0x10,%rdx
> 0.03%  cmp$0x10,%rdx

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

Jan Schultke  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #1 from Jan Schultke  ---
Oops, I meant that it calls __builtin_memmove(). Well, neither memmove nor
memcpy are visible in the output.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #2 from Jan Schultke  ---
Also it looks like GCC doesn't emit memcpy or memmove in either of the first
benchmarks. Those statements refer to the corresponding clang output, actually.
What's consistent for both compilers is that .assign() is dramatically slower
than any other method.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #3 from Jan Schultke  ---
When increasing the input size to 8 MiB, the results become more similar to
what clang delivers for 1 MiB too:
https://quick-bench.com/q/DFHYW6eZq-FAif8xuLkBOPwzYWA

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #6 from Stefan Schulze Frielinghaus  
---
I tried to reproduce it with a cross compiler while using the reproducer from
PR110867 without getting an ICE.  Can you attach a pre processed source file
and a corresponding gcc invocation?

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #4 from Jonathan Wakely  ---
Please provide the testcase in a usable form, not just a link to an external
site (which uses its own custom benchmark macros). This is requested at
https://gcc.gnu.org/

The relevant code is:

#include 
#include 

auto make_bytes() {
  std::array result;
  for (int i = 0; i < result.size(); i++) {
result[i] = i % 256;
  }
  return result;
}

auto raw_data = make_bytes();

static void BenchmarkAssign() {
  std::string target_data;
  target_data.assign(raw_data.begin(), raw_data.end());
}

This is not equivalent to the other forms of copying in the benchmark, because
string::assign has to handle possible aliasing. It's valid to do things like
str.assign(str.data()+1, str.data()+2).

The copy constructor doesn't have to deal with that case (the input can't
possibly alias the object under construction). The assignment operator form
also just uses the copy constructor followed by a cheap move. For the
resize+std::copy form there's a precondition on std::copy that the start of the
output range is not in the input range, which rules out the problematic forms
of aliasing.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #5 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #4)
> Please provide the testcase in a usable form, not just a link to an external
> site (which uses its own custom benchmark macros). This is requested at
> https://gcc.gnu.org/

Oops, I meant https://gcc.gnu.org/bugs.html

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

Jonathan Wakely  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-08-08
 Status|UNCONFIRMED |NEW

--- Comment #6 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #4)
> because string::assign has to handle possible aliasing. It's valid to do
> things like str.assign(str.data()+1, str.data()+2).

Even more problematic is something like:

str.assign(make_move_iterator(str.begin()), make_move_iterator(str.begin()+n));

or something similar with a counted_iterator wrapped in a common_iterator.

We can't rely on the iterators not being (convertible to) const char* to decide
that they don't alias the existing content, because arbitrary iterator types
could still alias the characters in the string.

It might be better to not use replace(begin(), end(), first, last) for
assign(first, last) though, because when we're replacing the entire string we
don't need the code that decides if we need to shuffle the existing content
around.

And _M_replace_dispatch creates a new copy anyway:

  _M_replace_dispatch(const_iterator __i1, const_iterator __i2,
  _InputIterator __k1, _InputIterator __k2,
  std::__false_type)
  {
// _GLIBCXX_RESOLVE_LIB_DEFECTS
// 2788. unintentionally require a default constructible allocator
const basic_string __s(__k1, __k2, this->get_allocator());
const size_type __n1 = __i2 - __i1;
return _M_replace(__i1 - begin(), __n1, __s._M_data(),
  __s.size());
  }

So maybe assign(InputIterator, InputIterator) could just do:

basic_string&
assign(_InputIterator __first, _InputIterator __last)
{ return assign(basic_string(__first, __last)); }

We know _M_replace_dispatch will make a copy anyway.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #7 from Jonathan Wakely  ---
(except with correct allocator propagation)

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #8 from Jan Schultke  ---
(In reply to Jonathan Wakely from comment #4)
> Please provide the testcase in a usable form, not just a link to an external
> site (which uses its own custom benchmark macros). This is requested at
> https://gcc.gnu.org/

Thanks, I will remember to do that in the future.

> This is not equivalent to the other forms of copying in the benchmark,
> because string::assign has to handle possible aliasing. It's valid to do
> things like str.assign(str.data()+1, str.data()+2).

>From what I could read in the `char_traits::move` code that presumably gets
called, this function explicitly tests for overlap between the memory regions,
and dispatches to cheap functions if possible. The input size was 8 MiB, so it
is unlikely that the overhead from this overlap detection is contributing in
any relevant capacity.

Basically, due to this overlap testing, `assign` SHOULD be just as fast as
other methods if there is no overlap (and in this case, there clearly is none).
However, it is 14x slower, so something is off.

Either I haven't followed the logic correctly, or there is a mistake in this
dispatching logic which leads to much worse performance for .assign().

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread panchenghui at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #7 from Chenghui Pan  ---
Created attachment 55706
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55706&action=edit
preprocessed file of gcc/tree-cfgcleanup.cc, ICE occurred in this place.

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread panchenghui at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #8 from Chenghui Pan  ---
(In reply to Stefan Schulze Frielinghaus from comment #6)
> I tried to reproduce it with a cross compiler while using the reproducer
> from PR110867 without getting an ICE.  Can you attach a pre processed source
> file and a corresponding gcc invocation?

I attach the a preprocessed file that ICE happening when bootstrapping. Sorry
for not adding it at first. 

The command that uses to compile this file is:
/home/panchenghui/upstream-unmodded/stuff/gcc/./prev-gcc/xg++ -save-temps
-B/home/panchenghui/upstream-unmodded/stuff/gcc/./prev-gcc/
-B/home/panchenghui/upstream-unmodded/install/loongarch64-unknown-linux-gnu/bin/
-nostdinc++
-B/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/src/.libs
-B/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs

-I/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/include/loongarch64-unknown-linux-gnu

-I/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/include
 -I/home/panchenghui/upstream-unmodded/gcc/libstdc++-v3/libsupc++
-L/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/src/.libs
-L/home/panchenghui/upstream-unmodded/stuff/gcc/prev-loongarch64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
 -fno-PIE -c   -g -O2 -fno-checking -gtoggle -DIN_GCC-fno-exceptions
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -fno-PIE -I. -I.
-I/home/panchenghui/upstream-unmodded/gcc/gcc
-I/home/panchenghui/upstream-unmodded/gcc/gcc/.
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../include 
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../libcpp/include
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../libcody 
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../libdecnumber
-I/home/panchenghui/upstream-unmodded/gcc/gcc/../libdecnumber/dpd
-I../libdecnumber -I/home/panchenghui/upstream-unmodded/gcc/gcc/../libbacktrace
  -o tree-cfgcleanup.o -MT tree-cfgcleanup.o -MMD -MP -MF
./.deps/tree-cfgcleanup.TPo
/home/panchenghui/upstream-unmodded/gcc/gcc/tree-cfgcleanup.cc

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #9 from Jonathan Wakely  ---
That improves things:


Benchmark  Time CPU   Iterations

BenchmarkInit  38136 ns38069 ns18243
BenchmarkAssignmentOp  45476 ns45382 ns15038
BenchmarkAssign45767 ns45653 ns15457
BenchmarkCopy  56617 ns56515 ns11721

[Bug other/110946] New: 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Bug ID: 110946
   Summary: 3x perf regression with -Os on M1 Pro
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dave.rodgman at arm dot com
  Target Milestone: ---

Please see
https://github.com/Mbed-TLS/mbedtls/pull/7784/commits/6cfd9b54ae0d06451c1a46a10e57fa099878bb03
for details.

On M1 Pro, under -Os, we see a 3.1x performance regression for AES-XTS, which
can be solved by forcing -O2 for two functions. For comparison, clang -Os gives
around 5% perf regression (which is more in the ballpark that I'd expect). So
it looks to me like gcc is getting something wrong when compiling these two
functions with -Os.

We measured a smaller but still significant difference (20-25%) on x86-64.

Affects all versions of gcc that I was able to test with (9 .. 12).

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #10 from Jonathan Wakely  ---
(In reply to Jan Schultke from comment #8)
> From what I could read in the `char_traits::move` code that presumably gets
> called, this function explicitly tests for overlap between the memory
> regions, and dispatches to cheap functions if possible. The input size was 8
> MiB, so it is unlikely that the overhead from this overlap detection is
> contributing in any relevant capacity.

I think you're reading it wrong. The overlap detection in char_traits::move is
only for constant evaluation, because we can't use memmove.

The overlap detection that matters here is in _M_replace, long before we use
char_traits::move.

> Basically, due to this overlap testing, `assign` SHOULD be just as fast as
> other methods if there is no overlap (and in this case, there clearly is
> none). However, it is 14x slower, so something is off.
> 
> Either I haven't followed the logic correctly, or there is a mistake in this
> dispatching logic which leads to much worse performance for .assign().

Or the optimizers don't optimize away all the checks in _M_replace and so we
don't unroll everything to a simple memmove, but do all the runtime checks
every time. Which is what I think is happening.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #11 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #6)
> And _M_replace_dispatch creates a new copy anyway:
> 
>   _M_replace_dispatch(const_iterator __i1, const_iterator __i2,
> _InputIterator __k1, _InputIterator __k2,
> std::__false_type)
>   {
>   // _GLIBCXX_RESOLVE_LIB_DEFECTS
>   // 2788. unintentionally require a default constructible allocator
>   const basic_string __s(__k1, __k2, this->get_allocator());
>   const size_type __n1 = __i2 - __i1;
>   return _M_replace(__i1 - begin(), __n1, __s._M_data(),
> __s.size());
>   }

When distance(k1, k2) > this->capacity() this function will make two copies of
[k1,k2) and allocate twice. So even with the checks for disjunct strings, we do
a lot more work than the copy construction benchmarks.

With this change we make a single allocation+copy and then do a cheap move
assignment:

--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -1711,4 +1711,4 @@
 basic_string&
 assign(_InputIterator __first, _InputIterator __last)
-{ return this->replace(begin(), end(), __first, __last); }
+{ return *this = basic_string(__first, __last, get_allocator()); }

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #1 from Dave Rodgman  ---
Disassembly under -Os:

139c :
139c:   a9b67bfdstp x29, x30, [sp, #-160]!
13a0:   910003fdmov x29, sp
13a4:   a9046bf9stp x25, x26, [sp, #64]
13a8:   aa0003f9mov x25, x0
13ac:   9000adrpx0, 0 <__stack_chk_guard>
13b0:   a90153f3stp x19, x20, [sp, #16]
13b4:   f940ldr x0, [x0]
13b8:   a9025bf5stp x21, x22, [sp, #32]
13bc:   2a0103f6mov w22, w1
13c0:   a90363f7stp x23, x24, [sp, #48]
13c4:   a90573fbstp x27, x28, [sp, #80]
13c8:   f941ldr x1, [x0]
13cc:   f9004fe1str x1, [sp, #152]
13d0:   d281mov x1, #0x0// #0
13d4:   710006dfcmp w22, #0x1
13d8:   54000c28b.hi155c   //
b.pmore
13dc:   d1004041sub x1, x2, #0x10
13e0:   aa0203f3mov x19, x2
13e4:   b27c4fe0mov x0, #0xf0   //
#16777200
13e8:   eb3fcmp x1, x0
13ec:   54000bc8b.hi1564   //
b.pmore
13f0:   9101a3f5add x21, sp, #0x68
13f4:   aa0303e2mov x2, x3
13f8:   aa0403f8mov x24, x4
13fc:   aa0503f7mov x23, x5
1400:   aa1503e3mov x3, x21
1404:   91048320add x0, x25, #0x120
1408:   52800021mov w1, #0x1// #1
140c:   9400bl  1210 
1410:   2a0003f4mov w20, w0
1414:   35000540cbnzw0, 14bc 
1418:   520002dbeor w27, w22, #0x1
141c:   d344fe7alsr x26, x19, #4
1420:   1200037band w27, w27, #0x1
1424:   92400e73and x19, x19, #0xf
1428:   910223fcadd x28, sp, #0x88
142c:   d100075asub x26, x26, #0x1
1430:   b100075fcmn x26, #0x1
1434:   54000541b.ne14dc   //
b.any
1438:   b4000433cbz x19, 14bc 
143c:   710002dfcmp w22, #0x0
1440:   d10042fbsub x27, x23, #0x10
1444:   9101e3faadd x26, sp, #0x78
1448:   aa1303e2mov x2, x19
144c:   9a95035acselx26, x26, x21, eq  // eq = none
1450:   aa1b03e1mov x1, x27
1454:   910223f5add x21, sp, #0x88
1458:   aa1703e0mov x0, x23
145c:   9400bl  0 
1460:   d2800217mov x23, #0x10  // #16
1464:   aa1303e3mov x3, x19
1468:   aa1a03e2mov x2, x26
146c:   aa1803e1mov x1, x24
1470:   aa1503e0mov x0, x21
1474:   9400bl  0 
1478:   cb1302e3sub x3, x23, x19
147c:   8b130342add x2, x26, x19
1480:   8b130361add x1, x27, x19
1484:   8b1302a0add x0, x21, x19
1488:   9400bl  0 
148c:   aa1503e3mov x3, x21
1490:   aa1503e2mov x2, x21
1494:   2a1603e1mov w1, w22
1498:   aa1903e0mov x0, x25
149c:   9400bl  1210 
14a0:   2a0003f4mov w20, w0
14a4:   35c0cbnzw0, 14bc 
14a8:   aa1703e3mov x3, x23
14ac:   aa1a03e2mov x2, x26
14b0:   aa1503e1mov x1, x21
14b4:   aa1b03e0mov x0, x27
14b8:   9400bl  0 
14bc:   9000adrpx0, 0 <__stack_chk_guard>
14c0:   f940ldr x0, [x0]
14c4:   f9404fe2ldr x2, [sp, #152]
14c8:   f941ldr x1, [x0]
14cc:   eb010042subsx2, x2, x1
14d0:   d281mov x1, #0x0// #0
14d4:   54000500b.eq1574   //
b.none
14d8:   9400bl  0 <__stack_chk_fail>
14dc:   f100027fcmp x19, #0x0
14e0:   1a9f07e0csetw0, ne  // ne = any
14e4:   6a1b001ftst w0, w27
14e8:   54e0b.eq1504   //
b.none
14ec:   b5dacbnzx26, 1504 
14f0:   a94687e0ldp x0, x1, [sp, #104]
14f4:   a90787e0stp x0, x1, [sp, #120]
14f8:   aa1503e1mov x1, x21
14fc:   aa1503e0mov x0, x21
1500:   97fffb63bl  28c 
1504:   aa1503e2mov x2, x2

[Bug modula2/110779] SysClock can not read the clock

2023-08-08 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110779

--- Comment #9 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #8 from Gaius Mulley  ---
> Created attachment 55703
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55703&action=edit
> Proposed fix (addendum)
>
> Here is a patch which tests for all the functions and structs in wrapclock.cc.

That patch does restore i386-pc-solaris2.11 bootstrap, thanks.

Testsuite results are good, with two exceptions: for 32-bit compilation
(only, 64-bit is fine), two tests reliably time out at all optimization
levels:

WARNING: gm2/iso/run/pass/m2date.mod execution,  -O  program timed out.

WARNING: gm2/iso/run/pass/testclock2.mod execution,  -O  program timed out.

Running them under truss shows that the last system call each is

5032:   clock_settime(3, 0x080975F0)Err#1 EPERM [sys_time]

Running testclock2.mod under gdb shows

Thread 2 received signal SIGINT, Interrupt.
[Switching to Thread 1 (LWP 1)]
0x08064ad4 in daysInMonth (year=42582828, month=7)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:225
225 BEGIN
(gdb) bt
#0  0x08064ad4 in daysInMonth (year=42582828, month=7)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:225
#1  0x08065152 in daysInYear (year=42582828, month=, 
day=)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:132
#2  ExtractDate (day=@0x808748c: 0, month=@0x8087484: 0, year=@0x8087480: 0, 
days=53508608997914)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:152
#3  m2iso_SysClock_GetClock (userData=...)
at
/vol/gcc/src/hg/master/local/libgm2/libm2iso/../../gcc/m2/gm2-libs-iso/SysClock.mod:204
#4  0x0805f652 in _M2_testclock2_init ()
#5  0x08068d7a in m2pim_M2Dependent_ConstructModules (
applicationmodule=0x805c2b4, libname=0x805c2bf, 
overrideliborder=0x805c2d8, argc=1, argv=0xfeffdafc, envp=0xfeffdb04)
at
/vol/gcc/src/hg/master/local/libgm2/libm2pim/../../gcc/m2/gm2-libs/M2Dependent.mod:809
#6  0x0805fca9 in _M2_init ()
#7  0x0805fcee in main ()

That year value seems very strange.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #12 from Jonathan Wakely  ---
That would also benefit this overload:

  basic_string&
  assign(initializer_list<_CharT> __l)
  { return this->assign(__l.begin(), __l.size()); }

That currently goes via replace(begin(), end(), l.begin(), l.end()) but we know
that an initializer_list cannot possibly alias the string's contents.

But we can do even better and avoid any copy if __l.size() <= capacity():

  basic_string&
  assign(initializer_list<_CharT> __l)
  {
const size_type __n = __l.size();
if (__n > capacity())
  *this = basic_string(__l.begin(), __l.end(), get_allocator());
else
  {
if (__n)
  _S_copy(_M_data(), __l.begin(), __n);
_M_set_length(__n);
  }
return *this;
  }

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #13 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #11)
> With this change we make a single allocation+copy and then do a cheap move
> assignment:
> 
> --- a/libstdc++-v3/include/bits/basic_string.h
> +++ b/libstdc++-v3/include/bits/basic_string.h
> @@ -1711,4 +1711,4 @@
>  basic_string&
>  assign(_InputIterator __first, _InputIterator __last)
> -{ return this->replace(begin(), end(), __first, __last); }
> +{ return *this = basic_string(__first, __last, get_allocator()); }

Except it's not cheap for C++98 because it's a copy, so we're back to
allocating and copying everything twice. That's solvable though.

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
So basically missed inlining at -Os, even memcpy wrappers are not inlined.

Can you provide a reproducible testcase?

Note that inline functions in mbedtls/library/alignment.h all miss the 'static'
qualifier, which affects inlining decisions, and looks like a mistake anyway
(if they are really meant to be non-static inlines, shouldn't there be a
comment?)

Does making them 'static inline' rectify the problem?

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:31ec413098bd334115aff73fc755e49afd3ac371

commit r14-3076-g31ec413098bd334115aff73fc755e49afd3ac371
Author: Richard Biener 
Date:   Tue Aug 8 12:46:42 2023 +0200

tree-optimization/110924 - fix vop liveness for noreturn const CFG parts

The virtual operand live problem used by sinking assumes we have
virtual uses at each end point of the CFG but as shown in the PR
this isn't true for parts for example ending in __builtin_unreachable.
The following removes the optimization made possible by this and
now requires marking backedges.

PR tree-optimization/110924
* tree-ssa-live.h (virtual_operand_live): Update comment.
* tree-ssa-live.cc (virtual_operand_live::get_live_in): Remove
optimization, look at each predecessor.
* tree-ssa-sink.cc (pass_sink_code::execute): Mark backedges.

* gcc.dg/torture/pr110924.c: New testcase.

[Bug tree-optimization/110924] [14 Regression] ICE on valid code at -O{s,2,3} on x86_64-linux-gnu: verify_ssa failed

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110924

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Richard Biener  ---
Fixed.

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Richard Biener  changed:

   What|Removed |Added

  Component|other   |ipa
 Target||aarch64
   Keywords||missed-optimization
 CC||marxin at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
Note you shouldn't use -Os if you care about performance.  GCC is quite
reasonable with code size increases at -O2 (as compared to other compilers). 
Instead I suggest you use -flto with -O2 to decrease the size of the final
executable/library and give GCC better knowledge on unit growth.

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Dave Rodgman  changed:

   What|Removed |Added

   Keywords|missed-optimization |
  Component|ipa |other
 Target|aarch64 |

--- Comment #4 from Dave Rodgman  ---
>From a quick test, it doesn't look like the unaligned access inlining is the
issue:

Not static inline, -Os:
  AES-XTS-128  : 853799 KiB/s,  0 cycles/byte
  AES-XTS-256  : 749919 KiB/s,  0 cycles/byte

Static inline, -Os:

  AES-XTS-128  : 885380 KiB/s,  0 cycles/byte
  AES-XTS-256  : 752995 KiB/s,  0 cycles/byte

Not static inline, -O2:
  AES-XTS-128  :2822656 KiB/s,  0 cycles/byte
  AES-XTS-256  :2425721 KiB/s,  0 cycles/byte

Static inline, -O2:
  AES-XTS-128  :2692321 KiB/s,  0 cycles/byte
  AES-XTS-256  :2446391 KiB/s,  0 cycles/byte

[Bug tree-optimization/49955] Fails to do partial basic-block SLP

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:d9f3ea61fe36e2de3354b90b65ff8245099114c9

commit r14-3078-gd9f3ea61fe36e2de3354b90b65ff8245099114c9
Author: Richard Biener 
Date:   Mon Aug 7 14:44:20 2023 +0200

tree-optimization/49955 - BB reduction with odd number of lanes

The following enhances BB reduction vectorization to support
vectorizing only a subset of the lanes, keeping the rest as
scalar ops.  For now we try to make the number of lanes even
by leaving alone the "last" lane.  That's because SLP discovery
with all lanes will fail too soon to get us any hint on which
lane to strip and likewise we don't know what vector modes the
target supports so restricting ourselves to power-of-two or
other cases isn't easy.

This is enough to get at the vectorization opportunity for the
testcase in the PR - albeit with the chosen lanes not optimal
but at least vectorizable.

PR tree-optimization/49955
* tree-vectorizer.h (_slp_instance::remain_stmts): New.
(SLP_INSTANCE_REMAIN_STMTS): Likewise.
* tree-vect-slp.cc (vect_free_slp_instance): Release
SLP_INSTANCE_REMAIN_STMTS.
(vect_build_slp_instance): Make the number of lanes of
a BB reduction even.
(vectorize_slp_instance_root_stmt): Handle unvectorized
defs of a BB reduction.

* gfortran.dg/vect/pr49955.f: New testcase.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 49955, which changed state.

Bug 49955 Summary: Fails to do partial basic-block SLP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/49955] Fails to do partial basic-block SLP

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0

--- Comment #7 from Richard Biener  ---
This is fixed now.

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #5 from Dave Rodgman  ---
(In reply to Richard Biener from comment #3)
> Note you shouldn't use -Os if you care about performance.  GCC is quite
> reasonable with code size increases at -O2 (as compared to other compilers).
> Instead I suggest you use -flto with -O2 to decrease the size of the final
> executable/library and give GCC better knowledge on unit growth.

Understood, but I think it depends on the magnitude of the perf difference. I'd
expect a smallish perf drop, say 10%, from -Os to be reasonable, but I'd
consider a 3x perf difference to be a compiler issue.(In reply to Alexander
Monakov from comment #2)
> So basically missed inlining at -Os, even memcpy wrappers are not inlined.
> 
> Can you provide a reproducible testcase?
> 
> Note that inline functions in mbedtls/library/alignment.h all miss the
> 'static' qualifier, which affects inlining decisions, and looks like a
> mistake anyway (if they are really meant to be non-static inlines, shouldn't
> there be a comment?)
> 
> Does making them 'static inline' rectify the problem?

The easiest way to reproduce is to use the benchmark tool:

make programs/test/benchmark CC=gcc CFLAGS="-Os"
programs/test/benchmark aes_xts

I don't have a compact reproducer, sorry.

[Bug target/110899] RFE: Attributes preserve_most and preserve_all

2023-08-08 Thread matz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899

--- Comment #11 from Michael Matz  ---
(In reply to Florian Weimer from comment #10)
> (In reply to Michael Matz from comment #9)
> > > > I don't see how that helps.  Imagine a preserve_all function foo that 
> > > > calls
> > > > printf.  How do you propose that 'foo' saves all parts of the SSE 
> > > > registers,
> > > > even those that aren't invented yet, or those that can't be touched by 
> > > > the
> > > > current ISA?  (printf might clobber all of these)
> > > 
> > > Vector registers are out of scope for this.
> > 
> > Why do you say that?  From clang: "Furthermore it also preserves all
> > floating-point registers (XMMs/YMMs)."  (for preserve_all, but this
> > bugreport does include that variant of the attribute).
> 
> Ugh, I preferred not to look at it because it's likely that the Clang
> implementation is broken (not future-proof).

I see, then we need to make clear that we aren't going to do anything about
preserve_all with clangs wording, in context of this report.

FWIW, in my implementation referred to above I chose to also have two variants:
one saving/restoring only the SSE2 parts of *mm8-*mm15 (i.e. xmm8-xmm15),
and one guaranteering to not clobber anything of *mm8-*mm15.  (No guarantees
about the *mm16 upwards).  The first variant can call foreign functions,
the second variant simply is allowed to only call functions that also give
that guarantee.

(There is also the question of mask registers, the clang docu doesn't talk
about them.  And I still would like to know the reason for the seemingly
arbitrary choice to leave some regs call clobbered for aarch64).

> > > But lets look at APX. If printf is recompiled to use APX, then it will
> > > clobber the extended register file. If we define __preserve_most__ the way
> > > we do in my psABI proposal (i.e., *not* as everything but %r11), the
> > > extended APX registers are still caller-saved.
> > 
> > Right, for preserve_most _with your wording_ it works out.  preserve_all
> > or preserve_most with clang wording doesn't.
> 
> In glibc, we already use a full context switch with XSAVE for the dynamic
> loader trampoline. As far as I understand it, it's not future-proof. The
> kernel could provide an interface that is guaranteed to work because it only
> enables those parts of the register file that it can context-switch. I can
> probably get the userspace-only implementation into glibc, but the kernel
> interface seems unlikely. We'd also have to work out the interaction of
> preserve_all and unwinding, setjmp etc.; not sure if there is a proper
> solution for that.

There are a couple possibilities to implement a halfway solution for this,
via XSAVE and friends, or via runtime dispatching dependend on current CPU
(e.g. provide a generic save/restore-stuff function in libgcc).  The problem
will always be where the memory for this save/restore pattern should come from,
its size isn't constant at compile time.  That's also solvable, but it's 
becoming more and more hairy.

That's why I chose to simply disallow calling foreign functions from those
that want to give very strict guarantees.  We could do the same for
preserve_all, if we absolutely want to have it.

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

--- Comment #14 from Jonathan Wakely  ---
Ah, and the patch will pessimize cases like str.assign(str2.begin(),
str2.end()) where
str.capacity() >= str2.capacity().

The current implementation in terms of replace(begin(), end(), first, last)
handles that, because replace is overloaded for string iterators and pointers.

Also solvable, but it's getting more complicated.

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread dave.rodgman at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #6 from Dave Rodgman  ---
Under clang, we see that mbedtls_xor being inlined, or not, causes an
equivalent perf difference. Note that mbedtls_xor is inline in the gcc O2
version and not in the gcc Os version.

Not inline mbedtls_xor, -Os clang:
  AES-XTS-128  : 834549 KiB/s,  0 cycles/byte
  AES-XTS-256  : 674383 KiB/s,  0 cycles/byte

Inline mbedtls_xor, -Os clang:
  AES-XTS-128  :2664799 KiB/s,  0 cycles/byte
  AES-XTS-256  :2278008 KiB/s,  0 cycles/byte


However, if I mark mbedtls_xor as static inline (actually, for testing
purposes, I created a static inline copy in aes.c), gcc still does not inline
it. I am not sure why. If I use "__attribute__((always_inline))" gcc will
inline it.

So it looks like gcc is overly averse to inlining this function, or is getting
the cost/benefit of inline-ing wrong here?

For 3/5 cases, we know at compile time that n == 16, so the function will
compile to four instructions:

139c:   3dc00021ldr q1, [x1]
13a0:   3dc00040ldr q0, [x2]
13a4:   6e211c00eor v0.16b, v0.16b, v1.16b
13a8:   3d80str q0, [x0]

so it does seem surprising that gcc doesn't want to inline this.

[Bug fortran/109684] compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-08-08 Thread pault at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109684

--- Comment #13 from Paul Thomas  ---
(In reply to Steve Kargl from comment #12)
> On Mon, Aug 07, 2023 at 10:04:54PM +, kargl at gcc dot gnu.org wrote:
> > 
> > diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
> > index 3cd470ddcca..b0bb8bc1471 100644
> > --- a/gcc/fortran/resolve.cc
> > +++ b/gcc/fortran/resolve.cc
> > @@ -17966,7 +17966,9 @@ resolve_types (gfc_namespace *ns)
> > 
> >for (n = ns->contained; n; n = n->sibling)
> >  {
> > -  if (gfc_pure (ns->proc_name) && !gfc_pure (n->proc_name))
> > +  if (gfc_pure (ns->proc_name)
> > + && !gfc_pure (n->proc_name)
> > + && !n->proc_name->attr.artificial)
> > gfc_error ("Contained procedure %qs at %L of a PURE procedure must "
> >"also be PURE", n->proc_name->name,
> >&n->proc_name->declared_at);
> > 
> > pault, dos the above look correct?
> > 
> 
> This patch passes a regression test with no new regressions
> on x86_64-*-*freebsd.

Hi Steve,

That will certainly fix the bug. An alternative crosses my mind, which is to
check the pureness of the final routines as the wrapper is being built and give
the wrapper the pure attribute if they are all pure.

Cheers

Paul

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-08-08 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #15 from Martin Uecker  ---
GCC seems to allocate enough for sizeof(struct foo) + n * sizeof(char) but not
for sizeof(struct { int a; char b; char t[n]; }).

[Bug c/110947] New: Should -Wmissing-variable-declarations not trigger on register variables?

2023-08-08 Thread ndesaulniers at google dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110947

Bug ID: 110947
   Summary: Should -Wmissing-variable-declarations not trigger on
register variables?
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ndesaulniers at google dot com
  Target Milestone: ---

Via:
https://lore.kernel.org/all/CAKwvOd=8kxkD9p+WW-F047ShN=r32slyyfpgzhydw3bxtdd...@mail.gmail.com/

I'm looking to enable -Wmissing-variable-declarations in the Linux kernel
(gcc-14 just gained support for this warning).

In one case I noticed that:

register unsigned long current_stack_pointer asm("rsp");

declared in a header would trigger this:

> no previous declaration for 'current_stack_pointer' 
> [-Wmissing-variable-declarations]

So we could add:

extern unsigned long current_stack_pointer;

before the

register unsigned long current_stack_pointer asm("rsp");

but that seems excessive. Perhaps we can simply not diagnose in that case?

Filed this bug report against clang as well:
https://github.com/llvm/llvm-project/issues/64509

[Bug tree-optimization/28794] missed optimization with non-COND_EXPR and vrp and comparisions

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28794

--- Comment #8 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:aadc5c07feb0ab08729ab25d0d896b55860ad9e6

commit r14-3084-gaadc5c07feb0ab08729ab25d0d896b55860ad9e6
Author: Andrew Pinski 
Date:   Mon Aug 7 00:05:21 2023 -0700

VR-VALUES [PR28794]: optimize compare assignments also

This patch fixes the oldish (2006) bug where VRP was not
optimizing the comparison for assignments while handling
them for GIMPLE_COND only.
It just happens to also solves PR 103281 due to allowing
to optimize `c < 1` to `c == 0` and then we get
`(c == 0) == c` (which was handled by r14-2501-g285c9d04).

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

PR tree-optimization/103281
PR tree-optimization/28794

gcc/ChangeLog:

* vr-values.cc
(simplify_using_ranges::simplify_cond_using_ranges_1): Split out
majority to ...
(simplify_using_ranges::simplify_compare_using_ranges_1): Here.
(simplify_using_ranges::simplify_casted_cond): Rename to ...
(simplify_using_ranges::simplify_casted_compare): This
and change arguments to take op0 and op1.
(simplify_using_ranges::simplify_compare_assign_using_ranges_1):
New method.
(simplify_using_ranges::simplify): For tcc_comparison assignments
call
simplify_compare_assign_using_ranges_1.
* vr-values.h (simplify_using_ranges): Add
new methods, simplify_compare_using_ranges_1 and
simplify_compare_assign_using_ranges_1.
Rename simplify_casted_cond and simplify_casted_compare and
update argument types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr103281-1.c: New test.
* gcc.dg/tree-ssa/vrp-compare-1.c: New test.

[Bug tree-optimization/103281] [12/13/14 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103281

--- Comment #11 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:aadc5c07feb0ab08729ab25d0d896b55860ad9e6

commit r14-3084-gaadc5c07feb0ab08729ab25d0d896b55860ad9e6
Author: Andrew Pinski 
Date:   Mon Aug 7 00:05:21 2023 -0700

VR-VALUES [PR28794]: optimize compare assignments also

This patch fixes the oldish (2006) bug where VRP was not
optimizing the comparison for assignments while handling
them for GIMPLE_COND only.
It just happens to also solves PR 103281 due to allowing
to optimize `c < 1` to `c == 0` and then we get
`(c == 0) == c` (which was handled by r14-2501-g285c9d04).

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

PR tree-optimization/103281
PR tree-optimization/28794

gcc/ChangeLog:

* vr-values.cc
(simplify_using_ranges::simplify_cond_using_ranges_1): Split out
majority to ...
(simplify_using_ranges::simplify_compare_using_ranges_1): Here.
(simplify_using_ranges::simplify_casted_cond): Rename to ...
(simplify_using_ranges::simplify_casted_compare): This
and change arguments to take op0 and op1.
(simplify_using_ranges::simplify_compare_assign_using_ranges_1):
New method.
(simplify_using_ranges::simplify): For tcc_comparison assignments
call
simplify_compare_assign_using_ranges_1.
* vr-values.h (simplify_using_ranges): Add
new methods, simplify_compare_using_ranges_1 and
simplify_compare_assign_using_ranges_1.
Rename simplify_casted_cond and simplify_casted_compare and
update argument types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr103281-1.c: New test.
* gcc.dg/tree-ssa/vrp-compare-1.c: New test.

[Bug c/65213] Extend -Wmissing-declarations to variables [i.e. add -Wmissing-variable-declarations]

2023-08-08 Thread ndesaulniers at google dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65213

Nick Desaulniers  changed:

   What|Removed |Added

 CC||ndesaulniers at google dot com

--- Comment #6 from Nick Desaulniers  ---
Thanks for implementing this!

I filed a follow up wrt to how this diagnostic interacts with `register`
storage variables.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110947 PTAL

[Bug c/110947] Should -Wmissing-variable-declarations not trigger on register variables?

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110947

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||diagnostic
   Last reconfirmed||2023-08-08
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug tree-optimization/28794] missed optimization with non-COND_EXPR and vrp and comparisions

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28794

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0

--- Comment #9 from Andrew Pinski  ---
Fixed finally.

[Bug tree-optimization/103281] [12/13/14 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103281
Bug 103281 depends on bug 28794, which changed state.

Bug 28794 Summary: missed optimization with non-COND_EXPR and vrp and 
comparisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28794

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug c++/110948] New: Incorrect -Winvalid-constexpr on virtual defaulted operator==

2023-08-08 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110948

Bug ID: 110948
   Summary: Incorrect -Winvalid-constexpr on virtual defaulted
operator==
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: arthur.j.odwyer at gmail dot com
  Target Milestone: ---

Bug #98712 seems related.

// https://godbolt.org/z/eKKxcovEn
struct D;
struct B {
bool operator==(const B&) const;
virtual bool operator==(const D&) const;
};
struct D : B {
bool operator==(const D&) const override = default;
};

GCC alone gives this bogus -Winvalid-constexpr warning:

: In member function 'virtual constexpr bool D::operator==(const D&)
const':
:10:10: warning: call to non-'constexpr' function 'bool
B::operator==(const B&) const' [-Winvalid-constexpr]
   10 | bool operator==(const D&) const override = default;
  |  ^~~~
:5:10: note: 'bool B::operator==(const B&) const' declared here
5 | bool operator==(const B&) const;
  |  ^~~~

This is obviously contrived code, but the symptom might indicate that GCC is
too eager to pretend that the user actually wrote `constexpr`, in situations
where the compiler is merely supposed to make an implicitly defined function
constexpr-friendly if possible.

The AFAICT-analogous situation with `operator=` instead of `operator==`
correctly compiles without warning: https://godbolt.org/z/Mof1qaadr

[Bug c++/101943] ICE: Segmentation fault (in cat_tag_for)

2023-08-08 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101943

Arthur O'Dwyer  changed:

   What|Removed |Added

 CC||arthur.j.odwyer at gmail dot 
com

--- Comment #4 from Arthur O'Dwyer  ---
This seems to be fixed since GCC 11.1; should it be RESOLVED FIXED at this
point?
https://godbolt.org/z/Tox8f716q

[Bug tree-optimization/103281] [12/13/14 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103281

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|10.5|14.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Andrew Pinski  ---
Fixed for GCC 14 and since this is a missed optimization with generated code,
it is less likely to show up in real code so closing as fixed.

[Bug c++/94162] ICE [neg] bad return type in defaulted <=>

2023-08-08 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94162

Arthur O'Dwyer  changed:

   What|Removed |Added

 CC||arthur.j.odwyer at gmail dot 
com

--- Comment #15 from Arthur O'Dwyer  ---
The test case in #c10 seems to be fixed since GCC 12; the rest were fixed since
GCC 11. Should this bug be RESOLVED FIXED at this point?
https://godbolt.org/z/d16x181xh

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Andrew Pinski  changed:

   What|Removed |Added

 Depends on||92716

--- Comment #7 from Andrew Pinski  ---
I am 99% sure this is basically PR 92716.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92716
[Bug 92716] -Os doesn't inline byteswap function even though it's a single
instruction

[Bug libstdc++/110945] std::basic_string::assign dramatically slower than other means of copying memory

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945

Jonathan Wakely  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug libstdc++/110862] format out of bounds read on format string "{0:{0}"

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110862

--- Comment #5 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:55eb7e92a60adfae43aaf58bb9c81050d39d82c9

commit r13-7697-g55eb7e92a60adfae43aaf58bb9c81050d39d82c9
Author: Jonathan Wakely 
Date:   Thu Aug 3 08:45:43 2023 +0100

libstdc++: Fix past-the-end increment in std::format [PR110862]

At the end of a replacement field we should check that the closing brace
is actually present before incrementing past it.

libstdc++-v3/ChangeLog:

PR libstdc++/110862
* include/std/format (_Scanner::_M_on_replacement_field):
Check for expected '}' before incrementing iterator.
* testsuite/std/format/string.cc: Check "{0:{0}" format string.

(cherry picked from commit 5d87f71bb462ccb78dd3d9d810ea08d96869cb4b)

[Bug libstdc++/110917] std::format_to(int*, ...) fails to compile because of _S_make_span

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110917

--- Comment #5 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:0f0152a93d15b24ebc7f6c7455baaded6a63fb2e

commit r13-7698-g0f0152a93d15b24ebc7f6c7455baaded6a63fb2e
Author: Jonathan Wakely 
Date:   Mon Aug 7 14:37:25 2023 +0100

libstdc++: Constrain __format::_Iter_sink for contiguous iterators
[PR110917]

We can't write to a span<_CharT> if the contiguous iterator has a value
type that isn't _CharT.

libstdc++-v3/ChangeLog:

PR libstdc++/110917
* include/std/format (__format::_Iter_sink):
Constrain partial specialization for contiguous iterators to
require the value type to be CharT.
* testsuite/std/format/functions/format_to.cc: New test.

(cherry picked from commit c5ea5aecac323e9094e4dc967f54090cb244bc6a)

[Bug libstdc++/110860] std::format("{:f}",2e304) invokes undefined behaviour

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110860

--- Comment #6 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:a059403794add2934961780662e320ba77798a7e

commit r13-7699-ga059403794add2934961780662e320ba77798a7e
Author: Jonathan Wakely 
Date:   Mon Aug 7 15:30:03 2023 +0100

libstdc++: Fix incorrect use of abs and log10 in std::format [PR110860]

The std::formatter implementation for floating-point types uses
__builtin_abs and __builtin_log10 to avoid including all of , but
those functions are not generic. The result of abs(2e304) is -INT_MIN
which is undefined, and then log10(INT_MIN) is NaN. As well as being
undefined, we fail to grow the buffer correctly, and then loop more
times than needed to allocate a buffer and try formatting the value into
it again.

We can use if-constexpr to choose the correct form of log10 to use for
the type, and avoid using abs entirely. This avoids the undefined
behaviour and should mean we only reallocate and retry std::to_chars
once.

libstdc++-v3/ChangeLog:

PR libstdc++/110860
* include/std/format (__formatter_fp::format): Do not use
__builtin_abs and __builtin_log10 with arbitrary floating-point
types.

(cherry picked from commit bb3ceeb6520c13fc5ca08af7d43fbd3f975e72b0)

[Bug libstdc++/110917] std::format_to(int*, ...) fails to compile because of _S_make_span

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110917

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #6 from Jonathan Wakely  ---
Fixed for 13.3, thanks for the report.

[Bug libstdc++/110862] format out of bounds read on format string "{0:{0}"

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110862

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Jonathan Wakely  ---
Fixed for 13.3, thanks for the report.

[Bug libstdc++/110860] std::format("{:f}",2e304) invokes undefined behaviour

2023-08-08 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110860

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Jonathan Wakely  ---
Fixed for 13.3, thanks for the report.

[Bug ipa/110378] IPA-SRA for destructors

2023-08-08 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110378

Christophe Lyon  changed:

   What|Removed |Added

 CC||clyon at gcc dot gnu.org

--- Comment #7 from Christophe Lyon  ---
The new test fails on arm:
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++14  scan-ipa-dump sra "Will split
parameter 0"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++14  scan-tree-dump-not optimized
"shouldnotexist"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++17  scan-ipa-dump sra "Will split
parameter 0"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++17  scan-tree-dump-not optimized
"shouldnotexist"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++20  scan-ipa-dump sra "Will split
parameter 0"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++20  scan-tree-dump-not optimized
"shouldnotexist"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++98  scan-ipa-dump sra "Will split
parameter 0"
FAIL: g++.dg/ipa/pr110378-1.C -std=gnu++98  scan-tree-dump-not optimized
"shouldnotexist"

I'm attaching pr110378-1.C.083i.sra

[Bug ipa/110378] IPA-SRA for destructors

2023-08-08 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110378

--- Comment #8 from Christophe Lyon  ---
Created attachment 55707
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55707&action=edit
pr110378-1.C.083i.sra

[Bug tree-optimization/110941] [14 Regression] Dead Code Elimination Regression at -O3 since r14-2379-gc496d15954c

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110941

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-08-08
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
  # RANGE [irange] unsigned int [0, 16][20, 20][24, 24][65532,
4294901776][4294967293, +INF] MASK 0xfffc VALUE 0x0
  # ivtmp.18_37 = PHI 
  # RANGE [irange] unsigned short [0, +INF] MASK 0xfffc VALUE 0xfffc
  _21 = (unsigned short) ivtmp.18_37;
  if (_21 <= 24)


Confirmed. The range for _21 is way to conserative.
It should have been: `[0, 16][20, 20][24, 24]`.

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #8 from Alexander Monakov  ---
Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a
straightforward wrapper for memcpy:

inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x)
{
memcpy(p, &x, sizeof(x));
}


We deciding to not inline this, while inlining its get_unaligned counterpart?
Seems bizarre.

[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:ad5b757d99b5a121198b79a6a42c1f15ae86a190

commit r14-3085-gad5b757d99b5a121198b79a6a42c1f15ae86a190
Author: Uros Bizjak 
Date:   Tue Aug 8 18:53:51 2023 +0200

i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math
[PR110832]

Also introduce -m[no-]partial-vector-fp-math option to disable trapping
V2SF named patterns in order to avoid generation of partial vector V4SFmode
trapping instructions.

The new option is enabled by default, because even with sanitization,
a small but consistent speed up of 2 to 3% with Polyhedron capacita
benchmark can be achieved vs. scalar code.

Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
vs. scalar code.  This is what clang does by default, as it defaults
to -fno-trapping-math.

PR target/110832

gcc/ChangeLog:

* config/i386/i386.opt (mpartial-vector-fp-math): New option.
* config/i386/mmx.md (movq__to_sse): Do not sanitize
upper part of V2SFmode register with -fno-trapping-math.
(v2sf3): Enable for ix86_partial_vec_fp_math.
(divv2sf3): Ditto.
(v2sf3): Ditto.
(sqrtv2sf2): Ditto.
(*mmx_haddv2sf3_low): Ditto.
(*mmx_hsubv2sf3_low): Ditto.
(vec_addsubv2sf3): Ditto.
(vec_cmpv2sfv2si): Ditto.
(vcondv2sf): Ditto.
(fmav2sf4): Ditto.
(fmsv2sf4): Ditto.
(fnmav2sf4): Ditto.
(fnmsv2sf4): Ditto.
(fix_truncv2sfv2si2): Ditto.
(fixuns_truncv2sfv2si2): Ditto.
(floatv2siv2sf2): Ditto.
(floatunsv2siv2sf2): Ditto.
(nearbyintv2sf2): Ditto.
(rintv2sf2): Ditto.
(lrintv2sfv2si2): Ditto.
(ceilv2sf2): Ditto.
(lceilv2sfv2si2): Ditto.
(floorv2sf2): Ditto.
(lfloorv2sfv2si2): Ditto.
(btruncv2sf2): Ditto.
(roundv2sf2): Ditto.
(lroundv2sfv2si2): Ditto.
* doc/invoke.texi (x86 Options): Document
-mpartial-vector-fp-math option.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110832-1.c: New test.
* gcc.target/i386/pr110832-2.c: New test.
* gcc.target/i386/pr110832-3.c: New test.

[Bug rtl-optimization/110939] [14 Regression] 14.0 ICE at rtl.h:2297 while bootstrapping on loongarch64

2023-08-08 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110939

--- Comment #9 from Stefan Schulze Frielinghaus  
---
Thanks for the reproducer and sorry for the hassle.

The normal form of a constant for a mode with fewer bits than in HOST_WIDE_INT
is a sign extended version of the original constant.  This even holds for
unsigned constants which I missed.  The following should fix this:

diff --git a/gcc/combine.cc b/gcc/combine.cc
index e46d202d0a7..9e5bf96a09d 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -12059,7 +12059,7 @@ simplify_compare_const (enum rtx_code code,
machine_mode mode,
   : (GET_MODE_SIZE (int_mode)
  - GET_MODE_SIZE (narrow_mode_iter)));
  *pop0 = adjust_address_nv (op0, narrow_mode_iter, offset);
- *pop1 = GEN_INT (n);
+ *pop1 = gen_int_mode (n, narrow_mode_iter);
  return adjusted_code;
}
 }

Can you give this a try?

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #9 from Alexander Monakov  ---
(In reply to Alexander Monakov from comment #2)
> Note that inline functions in mbedtls/library/alignment.h all miss the
> 'static' qualifier, which affects inlining decisions, and looks like a
> mistake anyway (if they are really meant to be non-static inlines, shouldn't
> there be a comment?)

Can you address this on the mbedtls side? Even if it doesn't help with the
observed slowdown, it will remain a problem for the future if left unfixed.

[Bug tree-optimization/110949] New: ((cast)cmp) - 1 should be tranformed into (cast)cmp` where cmp` is the inverse of cmp

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110949

Bug ID: 110949
   Summary: ((cast)cmp) - 1 should be tranformed into (cast)cmp`
where cmp` is the inverse of cmp
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
int f1(int a, int t)
{
  auto _6 = a == 115;
  auto _7 = (signed int) _6;
  return _6 - 1;
}
```
This should be the same as:
```
int f2(int a, int t)
{
  auto _6 = a != 115;
  auto _7 = (signed int) _6;
  return -_6;
}
```

[Bug middle-end/110950] New: RISC-V vector ICE in expand_const_vector

2023-08-08 Thread jeremy.bennett at embecosm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110950

Bug ID: 110950
   Summary: RISC-V vector ICE in expand_const_vector
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jeremy.bennett at embecosm dot com
  Target Milestone: ---

Created attachment 55708
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55708&action=edit
C source

The following code (testcase.c) causes an ICE when using RISC-V vector as
target.

a;
b() {
  long *c = 0;
  int *d;
  for (; a; ++a)
c[a] = d[-a];
}

Compiled with

riscv64-unknown-linux-gnu-gcc -march=rv64gcv -mabi=lp64d \
  -c -Ofast --param=riscv-autovec-preference=scalable \
  testcase.c

Output is

testcase.c:1:1: warning: data definition has no type or storage class
1 | a;
  | ^
testcase.c:1:1: warning: type defaults to 'int' in declaration of 'a'
[-Wimplicit-int]
testcase.c:2:1: warning: return type defaults to 'int' [-Wimplicit-int]
2 | b() {
  | ^
during RTL pass: expand
testcase.c: In function 'b':
testcase.c:6:10: internal compiler error: in expand_const_vector, at
config/riscv/riscv-v.cc:1510
6 | c[a] = d[-a];
  | ~^~~
0x8e70b6 expand_const_vector
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:1510
0x14a4df4 riscv_vector::legitimize_move(rtx_def*, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:1524
0x184044f gen_movrvvm8qi(rtx_def*, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/vector.md:1054
0xc56b57 rtx_insn* insn_gen_fn::operator()(rtx_def*,
rtx_def*) const
/home/jeremy/gittrees/mustang/gcc/gcc/recog.h:407
0xc56b57 emit_move_insn_1(rtx_def*, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:4164
0xc56f65 emit_move_insn(rtx_def*, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:4334
0xc2c4cd force_reg(machine_mode, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/explow.cc:693
0x14a2cee shuffle_generic_patterns
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:3120
0x14a2cee expand_vec_perm_const_1
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:3151
0x14a32b3 riscv_vector::expand_vec_perm_const(machine_mode, machine_mode,
rtx_def*, rtx_def*, rtx_def*, vec_perm_indices const&)
/home/jeremy/gittrees/mustang/gcc/gcc/config/riscv/riscv-v.cc:3203
0xefe0ce expand_vec_perm_const(machine_mode, rtx_def*, rtx_def*,
int_vector_builder > const&, machine_mode, rtx_def*)
/home/jeremy/gittrees/mustang/gcc/gcc/optabs.cc:6508
0xc4f682 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:10453
0xc53c58 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:10805
0xc4cb7a expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:9010
0xc4cb7a expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.h:310
0xc4cb7a expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:9345
0xc53c58 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:10805
0xc6062d store_expr(tree_node*, rtx_def*, int, bool, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:6325
0xc62201 expand_assignment(tree_node*, tree_node*, bool)
/home/jeremy/gittrees/mustang/gcc/gcc/expr.cc:6043
0xb1f05c expand_gimple_stmt_1
/home/jeremy/gittrees/mustang/gcc/gcc/cfgexpand.cc:3946
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

System information
--

Using built-in specs.
COLLECT_GCC=./riscv64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/home/jeremy/gittrees/mustang/install/libexec/gcc/riscv64-unknown-linux-gnu/14.0.0/lto-wrapper
Target: riscv64-unknown-linux-gnu
Configured with: /home/jeremy/gittrees/mustang/gcc/configure
--target=riscv64-unknown-linux-gnu
--prefix=/home/jeremy/gittrees/mustang/install
--with-sysroot=/home/jeremy/gittrees/mustang/install/sysroot
--with-pkgversion=gf9d93f8cc24 --with-system-zlib --enable-shared --enable-tls
--enable-languages=c,c++,fortran --disable-libmudflap --disable-libssp
--disable-libquadmath --disable-libsanitizer --disable-nls --disable-bootstrap
--src=/home/jeremy/gittrees/mustang/gcc --enable-multilib --with-abi=lp64d
--with-arch=rv64gc --with-tune= --with-isa-spec=20191213 'CFLAGS_FOR_TARGET=-O2
   -mcmodel=medany' 'CXX

[Bug middle-end/110950] RISC-V vector ICE in expand_const_vector

2023-08-08 Thread jeremy.bennett at embecosm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110950

--- Comment #1 from Jeremy Bennett  ---
Created attachment 55709
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55709&action=edit
Script to run the compilation

[Bug c++/110938] [11/12/13/14 Regression] miscompile if implicit special member is deleted and mutable

2023-08-08 Thread richard-gccbugzilla at metafoo dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110938

--- Comment #4 from Richard Smith  ---
Looks like the trait difference only happens if the templated constructor is
not deleted, but the ABI mismatch happens regardless. Possibly there are two
separate issues here?

[Bug c++/100482] namespaces as int in decltype expression

2023-08-08 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100482

--- Comment #3 from CVS Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:a90bd3ea6d1ba27b15476f0a768d7952c6723420

commit r14-3087-ga90bd3ea6d1ba27b15476f0a768d7952c6723420
Author: Nathaniel Shead 
Date:   Tue Aug 8 12:48:43 2023 +1000

c++: Report invalid id-expression in decltype [PR100482]

This patch ensures that any errors raised by finish_id_expression when
parsing a decltype expression are properly reported, rather than
potentially going ignored and causing invalid code to be accepted.

We can also now remove the separate check for templates without args as
this is also checked for in finish_id_expression.

PR c++/100482

gcc/cp/ChangeLog:

* parser.cc (cp_parser_decltype_expr): Report errors raised by
finish_id_expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-100482.C: New test.

Signed-off-by: Nathaniel Shead 

[Bug libstdc++/106611] std::is_nothrow_copy_constructible returns wrong result

2023-08-08 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106611

Arthur O'Dwyer  changed:

   What|Removed |Added

 CC||arthur.j.odwyer at gmail dot 
com

--- Comment #11 from Arthur O'Dwyer  ---
Jiang An wrote:
> I've mailed the LWG Chair to submit an LWG issue that requests clarification 
> of "is known not to throw any exceptions".
> FYI, there's at least one library implementor holding the same opinion as 
> yours.
> https://quuxplusone.github.io/blog/2023/04/17/noexcept-false-equals-default/

Quuxplusone here. :) I don't think this is LWG jurisdiction at all. This isn't
even a bug in libstdc++'s . This is purely a GCC core-language
bug. GCC's builtin __is_nothrow_constructible(T, T&&) simply returns the wrong
answer when the selected constructor is "trivial, but noexcept(false)."

// https://godbolt.org/z/5szW6KeWq
struct C {
  C(C&&) noexcept(false) = default;
};
static_assert(!__is_nothrow_constructible(C, C&&));
  // GCC+EDG fail; Clang+MSVC succeed

Notice that the builtin returns the correct answer when the selected
constructor is "non-trivial, noexcept(false), but still defaulted so we know it
can't throw." The problem is specifically with *trivial* ctors.

@jwakely, I propose that this issue should be recategorized as a compiler bug.
(And I'm also voting effectively "NAD" on LWG3967.)

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #10 from Alexander Monakov  ---
Ah, the non-static inlines are intentional, the corresponding extern
declarations appear in library/platform_util.c. Sorry, I missed that file the
first time around.

[Bug testsuite/110951] New: [13/14] RISCV: rv32 newlib gcc.c-torture testsuite fails with xgcc: fatal error: Cannot find suitable multilib set for '-march=rv32imafdc_zicsr_zifencei'/'-mabi=ilp32d'

2023-08-08 Thread ewlu at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110951

Bug ID: 110951
   Summary: [13/14] RISCV: rv32 newlib gcc.c-torture testsuite
fails with xgcc: fatal error: Cannot find suitable
multilib set for
'-march=rv32imafdc_zicsr_zifencei'/'-mabi=ilp32d'
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ewlu at rivosinc dot com
  Target Milestone: ---

Seeing the following error on newlib rv32 builds for most (all?)
gcc.c-torture/execute tests without the dg directives (for example,
gcc.c-torture/execute/20031020-1.c).

Error message: xgcc: fatal error: Cannot find suitable multilib set for
'-march=rv32imafdc_zicsr_zifencei'/'-mabi=ilp32d'
results in a compilation failure

Confirmed that the error exists on gcc-13.2.0 and gcc-14. 

Example can be found on
https://github.com/patrick-rivos/riscv-gnu-toolchain/issues/137

[Bug libstdc++/106611] std::is_nothrow_copy_constructible returns wrong result

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106611

--- Comment #12 from Andrew Pinski  ---
I suspect this is a dup of bug 100470 then.

[Bug testsuite/110951] [13/14] RISCV: rv32 newlib gcc.c-torture testsuite fails with xgcc: fatal error: Cannot find suitable multilib set for '-march=rv32imafdc_zicsr_zifencei'/'-mabi=ilp32d'

2023-08-08 Thread ewlu at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110951

--- Comment #1 from Edwin Lu  ---
On rv32 newlib GCC 12 the issue is different and might be unrelated:

- ABI is incompatible with that of the selected emulation:
  target emulation `elf64-littleriscv' does not match `elf32-littleriscv'
- ./20031020-1.exe(.text.exit): relocation "_exit+0x0 (type R_RISCV_CALL_PLT)"
goes out of range
- file class ELFCLASS64 incompatible with ELFCLASS32
- final link failed: file in wrong format

rv32 linux GCC 13/14 the issue is also different and might be unrelated
FAIL: gcc.c-torture/execute/20031020-1.c   -O0  (test for excess errors)
Testing execute/20031020-1.c,   -O1
Executing on host:
/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/xgcc
-B/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/ 
/scratch/ewlu/ci/triage/torture/gcc/gcc/testsuite/gcc.c-torture/execute/20031020-1.c
 -march=rv32gc -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output-O1
 -w  -lm  -o ./20031020-1.exe(timeout = 600)
output is In file included from
/scratch/ewlu/ci/triage/torture/build/sysroot/usr/include/features.h:515,
 from
/scratch/ewlu/ci/triage/torture/build/sysroot/usr/include/bits/libc-header-start.h:33,
 from
/scratch/ewlu/ci/triage/torture/build/sysroot/usr/include/limits.h:26,
 from
/scratch/ewlu/ci/triage/torture/build/lib/gcc/riscv64-unknown-linux-gnu/14.0.0/include/limits.h:205,
 from
/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/include/limits.h:205,
 from
/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/include/syslimits.h:7,
 from
/scratch/ewlu/ci/triage/torture/build/build-gcc-linux-stage2/gcc/include/limits.h:34,
 from
/scratch/ewlu/ci/triage/torture/gcc/gcc/testsuite/gcc.c-torture/execute/20031020-1.c:6:
/scratch/ewlu/ci/triage/torture/build/sysroot/usr/include/gnu/stubs.h:11:11:
fatal error: gnu/stubs-ilp32d.h: No such file or directory
compilation terminated.
 status 1
Checking pattern "sparc-*-sunos*" with x86_64-pc-linux-gnu
Checking pattern "alpha*-*-*" with x86_64-pc-linux-gnu
Checking pattern "hppa*-*-hpux*" with x86_64-pc-linux-gnu
compiler exited with status 1

[Bug tree-optimization/108397] Missed optimization with [0, 0][-1U,-1U] range arithmetics

2023-08-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108397

--- Comment #2 from Andrew Pinski  ---
I think the simple missed optimization here is:
```
int f2(int a)
{
 if(a != -1 && a != 0)
   __builtin_unreachable();
  unsigned c = a;
  if(c > 121212)
return 1;
  return 0;
}
```
This should be optimized to just:
```
int f2_(int a)
{
  return a != 0;
}
```


That is if we have:
```
  # RANGE [irange] unsigned int [0, 0][+INF, +INF]
  a.0_1 = (unsigned int) a_4(D);
  if (a.0_1 > 121212)
```
Since we have two values for a.0_1 we can just compare to one or the other in
the above case.

Then that will optimize the original testcase as we can optimize away the
nop_convert and then negative and then we get:
t1 <= t1
and that is folded trivially to true.

  1   2   >