date:20250220

[Bug target/118945] RISC-V: VSETL pass: Don't promote Vectors ops from Tail agnostic to Tail Undisturbed

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945

Jeffrey A. Law  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2025-02-20

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #15 from Sam James  ---
(In reply to David Malcolm from comment #14)
> FWIW I tried again building emacs (from git) with gcc trunk with
> --with-native-compilation=aot on x86_64 and, annoyingly, "make" completed
> successfully; I see lots of
>./native-lisp/31.0.50-677d9325/*.eln 
> which are "ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
> statically linked, not stripped"
> 

I'll get back to trying to find how to configure GCC s.t. it happens. It seems
like in the right environment, it always happens, I just don't know what the
condition is yet.

> How clean is Emacs under valgrind normally?

It's clean "enough" if you...
a) pass -DUSE_VALGRIND in CFLAGS or CPPFLAGS when building, and
b) use a suppression file (like
https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-01/txtaJC0QpICF7.txt,
which isn't perfect, but it made the output mostly clean for me)

When I ran the crasher under Valgrind, the only output I saw besides GC noise
at the beginning was the invalid access on the null deref. I didn't see
anything that looked useful or around the time of the crash, and the bit I did
see seemed like the usual innocent GC noise for Emacs.

[Bug target/118945] RISC-V: VSETL pass: Don't promote Vectors ops from Tail agnostic to Tail Undisturbed

2025-02-20 Thread juzhe.zhong at rivai dot ai via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945

--- Comment #9 from JuzheZhong  ---
(In reply to Andrew Waterman from comment #8)
> >  In fact, I'd be rather surprised to see anything preferring tail 
> > undisturbed.
> 
> Right.  To be precise, microarchitectures without register renaming
> absolutely do prefer to leave the tail undisturbed.  But that's why the ISA
> defines the agnostic mode in such a way that undisturbed is a valid
> implementation of agnostic.  (The in-order microarchitectures I've worked on
> simply ignore the tail-/mask-agnostic setting; the state bits that control
> the mode are essentially vestigial.)
> 
> Since no plausible implementation will benefit from being in undisturbed
> mode, we don't need to consider that aspect of the problem, but...
> 
> > I prefer fewer "vsetvli" (which allows more fusion) by default.
> 
> ...but here's the rub.  Implementations that don't benefit from the agnostic
> setting would definitely prefer to avoid the extra setvl instructions, not
> because they're expensive, but because they're not free.
> 
> > Some designs aren't sensitive to the number of vsetvls and I would expect 
> > that over time that's where high performance designs will land over time.
> 
> Low-performance ones, too.  (Making vset[i]vli fast is more of an
> engineering cost than a silicon cost.)  But the instructions still have to
> be fetched and decoded, and registers have to be read and written, so the
> perf cost will converge on that of, say, an ADDI instruction, which is to
> say cheap but not zero.  For narrow-issue machines, this does matter.
> 
> > Obviously for your design you'll want to set the knob which says "minimize 
> > vsetvls" as opposed to "avoid false dependencies by preferring tail 
> > agnostic". That's easily handled by putting the data in the tuning 
> > structure for each design.
> 
> And so this is the right answer :)

In my uarch, "vsetvli" is cheap but is not zero-cost which is pretty similar
ADDI. As andrew's said, for in-order microarchitecture, you can't ignore the
cost of "vsetvli" that's why I prefer keep original "vsetvli" strategy (which
is  fusing "vsetvli" as many as possible) by default.

For example, you should test it in K1 banana which is better ("keep agnostic
but more vsetvli" vs "allow aggressive fusion into single undisturbed".

Also, the example shows in the PR is not appropriate to make us to make a
decision here since it just produce 1 vsetvli when you disable aggressive
fusion 
into undisturbed which seems to not to be very costly.

I think we should consider many more different situation and consider it
carefully. Like:

vsetvli ... e8,mf8 ta ma (demand ratio)
...
vservli zero zero e32 mf2 tu ma (demand ratio)
...
vservli zero zero e64 m1 ta ma (demand SEW and LMUL)
...
vservli zero zero e64 m1 ta mu (demand ratio)
...
vservli zero zero e16 mf4 tu mu(demand ratio)
...
vservli zero zero e32 mf2 ta ma(demand ratio)
...
vservli zero zero e8 mf8 ta ma(demand ratio)

In current strategy, 7 "vsetvli" will be fused into 1 single "vsetvli":

vservli ... e64 m1 tu mu

However, if you just keep agnostic not allow to fuse it, you will end up with 6
more "vsetvli"s. I don't think this codegen can better in any
micro-architecture design.

[Bug rtl-optimization/118946] Missed optimization: GCC reserves stack space for optimized-out variable

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118946

--- Comment #4 from Andrew Pinski  ---
(In reply to Jeffrey A. Law from comment #2)
> Marking as a duplicate of one I happen to know about.  I suspect there are
> others.
> 
> *** This bug has been marked as a duplicate of bug 94713 ***

I think this is unrelated. The issue here is the tree level can't optimize away
the memcpy into a memset. This is a dup of bug 117634 really.

*** This bug has been marked as a duplicate of bug 117634 ***

[Bug target/118934] [15 Regression] RISC-V: ICE: output_operand: invalid expression as operand

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118934

Jeffrey A. Law  changed:

   What|Removed |Added

   Last reconfirmed||2025-02-20
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #1 from Jeffrey A. Law  ---
We need a testcase.  If you use cvise to reduce the testcase I bet the final
result will be trivial in nature and probably trivial to obfuscate if you want.
 Note that WRF sources are generally available (http://www.wrf-model.org), so
obfuscation may not be strictly necessary.

[Bug tree-optimization/118947] Missed optimization: GCC forgets stack buffer contents across function call

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118947

--- Comment #4 from Andrew Pinski  ---
Created attachment 60551
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60551&action=edit
Fixes bbb's stack usage

This patch fixes the stack usage of bbb function in comment #0.

[Bug tree-optimization/118963] [13/14/15 regression] Miscompile at -O2/3 since r13-6945-g429a7a88438cc8

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118963

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 118922 ***

[Bug tree-optimization/118922] [13/14/15 regression] Miscompile at -O2/3 since r13-6945-g429a7a88438cc8

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118922

--- Comment #9 from Andrew Pinski  ---
*** Bug 118963 has been marked as a duplicate of this bug. ***

[Bug preprocessor/118860] [15 Regression] ICE Segfault with --param=file-cache-files= since r15-7431-g66af77cbed6c5b

2025-02-20 Thread heiko at hexco dot de via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118860

Heiko Eißfeldt  changed:

   What|Removed |Added

 CC||heiko at hexco dot de

--- Comment #2 from Heiko Eißfeldt  ---
$ g++ pr31078.C --param=file-cache-files=16 -Wunused
works

$ g++ pr31078.C --param=file-cache-files=17 -Wunused
does not

Looks like file_cache::tune() needs to reallocate when num_file_slots_ is
greater than the default
size_t file_cache::num_file_slots = 16;
used by the constructor

file_cache::file_cache ()
: m_file_slots (new file_cache_slot[num_file_slots])
{
  initialize_input_context (nullptr, false);
}

[Bug rtl-optimization/118947] Missed optimization: GCC forgets stack buffer contents across function call

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118947

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=117634
   Severity|normal  |enhancement
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
   Last reconfirmed||2025-02-20
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
Related to PR 117634.

Currerntly optimize_memcpy_to_memset does not skip over vdefs that can't
clobber `buf` as it is supposed to be a simple analysis. But I suspect we could
extend it to skip over vdefs that don't clobber the buf.

[Bug tree-optimization/118963] [13/14/15 regression] Miscompile at -O2/3 since r13-6945-g429a7a88438cc8

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118963

Sam James  changed:

   What|Removed |Added

   Target Milestone|--- |13.4
Summary|Miscompile at -O2/3 |[13/14/15 regression]
   ||Miscompile at -O2/3 since
   ||r13-6945-g429a7a88438cc8
   Keywords||wrong-code
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=118922
  Component|c   |tree-optimization

[Bug target/80878] -mcx16 (enable 128 bit CAS) on x86_64 seems not to work on 7.1.0

2025-02-20 Thread lh_mouse at 126 dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

--- Comment #47 from LIU Hao  ---
(In reply to Luke Dalessandro from comment #46)
> But if 104688 isn't related to this issue, and thus Jakub's comment was in
> error, I definitely don't understand the underlying problem and why clang is
> fine doing it.

Issue here is that if atomic load is implemented with a call to libatomic
routines then it's incorrect to implement CAS without a call.

[Bug target/118949] [15 regression] RISC-V: Extra FRM writes since GCC-14.2 since r15-5943-gdc0dea98c96e02

2025-02-20 Thread pan2.li at intel dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118949

--- Comment #5 from Li Pan  ---
Thanks Vineet, update another case with explicit convert. It is unrelated to
the global_reg change.

   1   │ #define T float
   2   │
   3   │ void func(const T * restrict a, const T * restrict  b,
   4   │ T * restrict c)
   5   │ {
   6   │ for (long i = 0; i < 1024; ++i) {
   7   │ double a_d = (double)a[i];
   8   │ double b_d = (double)b[i];
   9   │
  10   │ long a_l = __builtin_lround(a_d);
  11   │ long b_l = __builtin_lround(b_d);
  12   │
  13   │ c[i] = (T)(a_l + b_l);
  14   │ }
  15   │ }

The diff almost occurs after vect pass.

from:

vect__4.9_36 = .MASK_LEN_LOAD (vectp_a.7_38, 32B, { -1, ... }, _11, 0);
vect__6.12_32 = .MASK_LEN_LOAD (vectp_b.10_34, 32B, { -1, ... }, _11, 0)

vect_a_l_15.13_31 = .LROUND (vect__4.9_36);
vect_b_l_16.14_30 = .LROUND (vect__6.12_32);
vect__7.15_29 = vect_a_l_15.13_31 + vect_b_l_16.14_30;
vect__9.16_28 = (vector([2,2]) float) vect__7.15_29;
.MASK_LEN_STORE (vectp_c.17_26, 32B, { -1, ... }, _11, 0, vect__9.16_28);

to:

vect__4.9_43 = .MASK_LEN_LOAD (vectp_a.7_46, 32B, { -1, ... }, _44(D), _23, 0);
vect_a_d_14.10_42 = (vector([2,2]) double) vect__4.9_43; // Only in GCC-15
vect_a_l_17.11_41 = .LROUND (vect_a_d_14.10_42);

vect__6.14_36 = .MASK_LEN_LOAD (vectp_b.12_39, 32B, { -1, ... }, _37(D), _23,
0);
vect_b_d_16.15_35 = (vector([2,2]) double) vect__6.14_36; // Only in GCC-15
vect_b_l_18.16_34 = .LROUND (vect_b_d_16.15_35);

vect__7.17_33 = vect_a_l_17.11_41 + vect_b_l_18.16_34;
vect__9.18_32 = (vector([2,2]) float) vect__7.17_33;

.MASK_LEN_STORE (vectp_c.19_30, 32B, { -1, ... }, _23, 0, vect__9.18_32);

looks like have more convert after load...

[Bug tree-optimization/107263] Memcpy not elided when initializing struct

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107263

--- Comment #4 from Andrew Pinski  ---
(In reply to AK from comment #3)
> Seems like a duplicate of #59863 ?

No different issue . There we have an array which is all the way constant but
here we have a non-constant part.

[Bug middle-end/23782] SRA pessimizes passing structures by value at -Os (+22% code size)

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23782

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
  Component|target  |middle-end

--- Comment #8 from Andrew Pinski  ---
.

[Bug target/118955] Fortran uses vector math functions without -ffast-math

2025-02-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118955

--- Comment #9 from Richard Biener  ---
I have also always wondered about that glibc guard, esp. it being the
kitchen-sink fast-math guard rather than sth more specific (yep, we don't have
anything for -funsafe-math-optimizations).  That is, I suppose glibc does not
set
FP exception flags "correctly" either.

Is there documentation on what you can expect from the glibc vector math
functions with regard to IEEE conformance?

[Bug libfortran/118935] Segmentation fault in 'libgomp.fortran/rwlock_1.f90' when compiling libgfortran with '-O0'

2025-02-20 Thread tkoenig at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118935

--- Comment #10 from Thomas Koenig  ---
What does the OpenMP standard say about I/O in partallel exexution?

[Bug target/118952] AArch64 get_fpcr and set_fpcr builtins don't block reordering of operations past them

2025-02-20 Thread rsandifo at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118952

Richard Sandiford  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=34678
 CC||rsandifo at gcc dot gnu.org

--- Comment #1 from Richard Sandiford  ---
I think this is essentially the same problem as PR34678.

[Bug libstdc++/118559] [15 Regression] __array_rank is broken for clang so need workaround in libstdc++

2025-02-20 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118559

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Jonathan Wakely  ---
Fixed

[Bug libstdc++/118559] [15 Regression] __array_rank is broken for clang so need workaround in libstdc++

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118559

--- Comment #1 from GCC Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:c0e865f73ddee2e7247a23a7d57ad80261861d35

commit r15-7650-gc0e865f73ddee2e7247a23a7d57ad80261861d35
Author: Jonathan Wakely 
Date:   Wed Feb 19 14:46:32 2025 +

libstdc++: Workaround Clang bug with __array_rank built-in [PR118559]

We started using the __array_rank built-in with r15-1252-g6f0dfa6f1acdf7
but that built-in is buggy in versions of Clang up to and including 19.

libstdc++-v3/ChangeLog:

PR libstdc++/118559
* include/std/type_traits (rank, rank_v): Do not use
__array_rank for Clang 19 and older.

[Bug libstdc++/118855] Simplify when __builtin_*g builtins are available

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118855

--- Comment #4 from GCC Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:e8ad697a75b0870a833366daf687668a57cabb6e

commit r15-7648-ge8ad697a75b0870a833366daf687668a57cabb6e
Author: Jonathan Wakely 
Date:   Wed Feb 19 14:48:04 2025 +

libstdc++: Use new type-generic built-ins in  [PR118855]

This makes several functions in  faster to compile, with fewer
expressions to parse and fewer instantiations of __numeric_traits
required.

libstdc++-v3/ChangeLog:

PR libstdc++/118855
* include/std/bit (__count_lzero, __count_rzero, __popcount):
Use type-generic built-ins when available.

[Bug libstdc++/118855] Simplify when __builtin_*g builtins are available

2025-02-20 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118855

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Jonathan Wakely  ---
Done

[Bug libstdc++/104928] std::counting_semaphore on Linux can sleep forever

2025-02-20 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|15.0|16.0

[Bug c++/118951] New: FILE inserts the filename as array, __builtin_FILE as pointer

2025-02-20 Thread fabian_kessler at gmx dot de via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118951

Bug ID: 118951
   Summary: __FILE__ inserts the filename as array, __builtin_FILE
as pointer
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fabian_kessler at gmx dot de
  Target Milestone: ---

__builtin_FILE should return the FILE_PATH as array, not as pointer.
This wouldn't have any drawbacks, since an array can be decayed as a pointer,
but it is currently not possible, to do it the other way round for 0-terminated
strings.

Returning __builtin_FILE as array will allow code like the following:

```
template 
struct strong_alias<>{/*...*/};
```

In the current situation, it is only possible, to do this via macros.

Also add __builtin_FILE_NAME, which is supported by clang.

[Bug target/118952] New: AArch64 get_fpcr and set_fpcr builtins don't block reordering of operations past them

2025-02-20 Thread ktkachov at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118952

Bug ID: 118952
   Summary: AArch64 get_fpcr and set_fpcr builtins don't block
reordering of operations past them
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

The __builtin_aarch64_set_fpcr and __builtin_aarch64_get_fpcr builtins are not
as useful in practice as we'd like. For the input code:

#include 
#include 

uint64_t foo (uint32_t *in_fpcr, uint32_t src) {
uint64_t dst;
uint32_t saved_fpcr;
float fsrc;

saved_fpcr = __builtin_aarch64_get_fpcr();
__builtin_aarch64_set_fpcr(*in_fpcr);

memcpy(&fsrc, &src, 4);
double d = (double) fsrc;
memcpy(&dst, &d, 8);

*in_fpcr = __builtin_aarch64_get_fpcr();
__builtin_aarch64_set_fpcr(saved_fpcr);
return dst;
}

at -O2 we get:
foo:
fmovs31, w1
mrs x1, fpcr
ldr w2, [x0]
msr fpcr, x2
mrs x2, fpcr
str w2, [x0]
msr fpcr, x1
fcvtd31, s31
fmovx0, d31
ret

The problem is that the fcvt is moved outside the region that has a modified
FPCR, defeating the purpose of the builtins.
I initially thought this was the RTL insn scheduler moving the operations but
the RTL patterns for the builtins do use unspec_volatile that is supposed to
prevent such movement.
But the problem seems to be at expand-time. The GIMPLE looks correct:

  saved_fpcr_6 = __builtin_aarch64_get_fpcr ();
  _1 = *in_fpcr_7(D);
  __builtin_aarch64_set_fpcr (_1);
  _14 = VIEW_CONVERT_EXPR(src_9(D));
  _2 = (double) _14;
  _10 = VIEW_CONVERT_EXPR(_2);
  _3 = __builtin_aarch64_get_fpcr ();
  *in_fpcr_7(D) = _3;
  __builtin_aarch64_set_fpcr (saved_fpcr_6);
  return _10;

but the RTL generation is:
(insn 2 5 3 2 (set (reg/v/f:DI 107 [ in_fpcr ])
(reg:DI 0 x0 [ in_fpcr ])) "fpcr.c":4:48 -1
 (nil))
(insn 3 2 4 2 (set (reg/v:SI 108 [ src ])
(reg:SI 1 x1 [ src ])) "fpcr.c":4:48 -1
 (nil))
(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 4 8 2 (set (reg/v:SI 104 [ saved_fpcr ])
(unspec_volatile:SI [
(const_int 0 [0])
] UNSPECV_GET_FPCR)) "fpcr.c":9:18 -1
 (nil))
(insn 8 7 9 2 (set (reg:SI 109)
(mem:SI (reg/v/f:DI 107 [ in_fpcr ]) [1 *in_fpcr_7(D)+0 S4 A32]))
"fpcr.c":10:5 -1
 (nil))
(insn 9 8 10 2 (unspec_volatile [
(reg:SI 109)
] UNSPECV_SET_FPCR) "fpcr.c":10:5 -1
 (nil))
(insn 10 9 11 2 (set (reg:SI 103 [ _3 ])
(unspec_volatile:SI [
(const_int 0 [0])
] UNSPECV_GET_FPCR)) "fpcr.c":16:16 -1
 (nil))
(insn 11 10 12 2 (set (mem:SI (reg/v/f:DI 107 [ in_fpcr ]) [1 *in_fpcr_7(D)+0
S4 A32])
(reg:SI 103 [ _3 ])) "fpcr.c":16:14 discrim 1 -1
 (nil))
(insn 12 11 13 2 (unspec_volatile [
(reg/v:SI 104 [ saved_fpcr ])
] UNSPECV_SET_FPCR) "fpcr.c":17:5 -1
 (nil))
(insn 13 12 14 2 (set (reg:DF 111 [ _2 ])
(float_extend:DF (subreg:SF (reg/v:SI 108 [ src ]) 0))) "fpcr.c":13:16
-1
 (nil))
(insn 14 13 18 2 (set (reg:DI 106 [  ])
(subreg:DI (reg:DF 111 [ _2 ]) 0)) "fpcr.c":18:12 -1
 (nil))

insn 13 has been moved past the GET_FPCR and SET_FPCR builtins. Is that
something the out-of-ssa code is doing?

[Bug tree-optimization/118521] [15 regression] std::vector Wstringop-overflow false positive since r15-4473

2025-02-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118521

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=118817

--- Comment #9 from Richard Biener  ---
Looks similar to PR118817 btw.  Like that we're diagnosing from the strlen pass
which has a somewhat unfortunate pass position.

 [local count: 131235111]:
_53 = operator new (2);

 [local count: 131235111]:
MEM  [(char * {ref-all})_53] = MEM  [(char
* {ref-all})&C.0];
__result_46 = _53 + 2;
_150 = operator new (4);
goto ; [100.00%]

 [local count: 131235112]:
_97 = _150 + 2;
__builtin_memset (_97, 0, 2);
MEM  [(char * {ref-all})_150] = 513;
__result_274 = _150 + 1;
__new_finish_106 = __result_274 + 3;
operator delete (_53, 2);
_115 = _150 + 4;
if (__new_finish_106 != _115)
  goto ; [82.57%]
else
  goto ; [17.43%]

 [local count: 108360832]:
MEM[(char *)_97 + 2B] = 1;

like in the other PR we are missing the power of forwprop which would have
accumulated the constant adjustments, eliding the BB6 enter condition.
As you say it's SCCP exposing the opportunity.

Neither FRE nor DOM have the ability to prove equivalence on larger
expressions like this, aka (_150 + 4) == ((_150 + 1) + 3) but they
instead rely on instruction combinations.

Now, FRE does "fold" each stmt, but tries to simplify it down to a
constant/copy
and if that's not possible goes with the original stmt for further processing
rather than using the simplified expression.  That's wasteful.  It also
folds at elimination time, so this early folding is supposedly redundant iff
we think the IL should be always fully folded (which is isn't, obviously).

For PR118817 I've addressed this case in PRE.  For the more general VN case
it's a bit more difficult to do cleanly and definitely out-of-scope for stage4.

I'll see what the fallout is when moving forwprop4 earlier (the late passes
are oddly ordered IMO).

There's also the pragmatic way of dealing with this in VN which is
replacing the simplification attempt with in-place folding, but that's
only OK when not iterating (or we're first-time visiting a stmt, but I'd
rather not go there).  There's unfortunately a difference between what
fold_stmt and gimple_fold_stmt_to_constant does ... but maybe it does not
matter ... turns out it does.

All of the attempts have testsuite fallout, of course.

Before r5-1495-g24314386b32b93 strlen was even earlier, but it was specifically
placed before VRP.  forwprop is currently specifically after DSE/DCE
because the single-use gates benefit from DCEd IL.  strlen OTOH is a
source of constants and pruned memory ops so placing it before DSE/DCE
makes sense.  At r5-1495-g24314386b32b93 there wasn't a CCP after VRP,
so it might be possible to move strlen a bit later (but then it will be
after another jump threading...).

Moving forwprop between pass_thread_jumps and pass_dominator does have
quite some diagnostic fallout.

Doing

diff --git a/gcc/passes.def b/gcc/passes.def
index 9fd85a35a63..c02fd0e186d 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -346,9 +346,10 @@ along with GCC; see the file COPYING3.  If not see
  form if possible.  */
   NEXT_PASS (pass_thread_jumps, /*first=*/false);
   NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
-  NEXT_PASS (pass_strlen);
   NEXT_PASS (pass_thread_jumps_full, /*first=*/false);
   NEXT_PASS (pass_vrp, true /* final_p */);
+  NEXT_PASS (pass_forwprop, /*last=*/true);
+  NEXT_PASS (pass_strlen);
   /* Run CCP to compute alignment and nonzero bits.  */
   NEXT_PASS (pass_ccp, true /* nonzero_p */);
   NEXT_PASS (pass_warn_restrict);
@@ -356,7 +357,6 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_dce, true /* update_address_taken_p */, true /*
remove_unused_locals */);
   /* After late DCE we rewrite no longer addressed locals into SSA
 form if possible.  */
-  NEXT_PASS (pass_forwprop, /*last=*/true);
   NEXT_PASS (pass_sink_code, true /* unsplit edges */);
   NEXT_PASS (pass_phiopt, false /* early_p */);
   NEXT_PASS (pass_fold_builtins);

An even more pragmatic approach is a single-level of folding uses (for
changed defs) from SCCP.  For full effect it would use a worklist and
re-fold uses of defs of folded uses as well.  Similar like
simple_dce_from_worklist (which could also re-fold uses of defs that
become single-use for example).

diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 0ba85917d41..a0d1c2f3d86 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -284,6 +284,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-into-ssa.h"
 #include "builtins.h"
 #include "case-cfn-macros.h"
+#include "tree-eh.h"

 static tree analyze_sc

[Bug target/109780] [12/13/14/15 Regression] csmith: runtime crash with -O2 -march=znver1

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780
Bug 109780 depends on bug 118936, which changed state.

Bug 118936 Summary: [15 Regression] ICE in ix86_finalize_stack_frame_flags, at 
config/i386/i386.cc:8683
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118936

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/118936] [15 Regression] ICE in ix86_finalize_stack_frame_flags, at config/i386/i386.cc:8683

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118936

Sam James  changed:

   What|Removed |Added

 CC||sjames at gcc dot gnu.org

--- Comment #15 from Sam James  ---
Patch was reverted in r15-7634-g0312d11be3f666 and r15-7635-g6921c93d205203 to
try again in GCC 15. The revert fixes this PR. Testcase was added, so fixed.

[Bug target/109093] [15 regression] csmith: a February runtime bug ?

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109093
Bug 109093 depends on bug 118936, which changed state.

Bug 118936 Summary: [15 Regression] ICE in ix86_finalize_stack_frame_flags, at 
config/i386/i386.cc:8683
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118936

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug fortran/107635] [Coarray] Allocatable components of types defined in module's interface are not handled correctly when used in coarrays.

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107635

--- Comment #9 from GCC Commits  ---
The master branch has been updated by Andre Vehreschild :

https://gcc.gnu.org/g:69eb02682b80b84dd0f562f19821c8c8c37ad243

commit r15-7642-g69eb02682b80b84dd0f562f19821c8c8c37ad243
Author: Andre Vehreschild 
Date:   Wed Jan 29 12:42:18 2025 +0100

Fortran: Add send_to_remote [PR107635]

Refactor to use send_to_remote instead of the slow send_by_ref.

gcc/fortran/ChangeLog:

PR fortran/107635

* coarray.cc (move_coarray_ref): Move the coarray reference out
of the given one.  Especially when there is a regular array ref.
(fixup_comp_refs): Move components refs to a derived type where
the codim has been removed, aka a new type.
(split_expr_at_caf_ref): Correctly split the reference chain.
(remove_caf_ref): Simplify.
(create_get_callback): Fix some deficiencies.
(create_allocated_callback): Adapt to new signature of split.
(create_send_callback): New function.
(rewrite_caf_send): Rewrite a call to caf_send to
caf_send_to_remote.
(coindexed_code_callback): Treat caf_send and caf_sendget
correctly.
* gfortran.h (enum gfc_isym_id): Add SENDGET-isym.
* gfortran.texi: Add documentation for send_to_remote.
* resolve.cc (gfc_resolve_code): No longer generate send_by_ref
when allocatable coarray (component) is on the lhs.
* trans-decl.cc (gfc_build_builtin_function_decls): Add
caf_send_to_remote decl.
* trans-intrinsic.cc (conv_caf_func_index): Ensure the static
variables created are not in a block-scope.
(conv_caf_send_to_remote): Translate caf_send_to_remote calls.
(conv_caf_send): Renamed to conv_caf_sendget.
(conv_caf_sendget): Renamed from conv_caf_send.
(gfc_conv_intrinsic_subroutine): Branch correctly for
conv_caf_send and sendget.
* trans.h: Correct decl.

libgfortran/ChangeLog:

* caf/libcaf.h: Add/Correct prototypes for caf_get_from_remote,
caf_send_to_remote.
* caf/single.c (struct accessor_hash_t): Rename accessor_t to
getter_t.
(_gfortran_caf_register_accessor): Use new name of getter_t.
(_gfortran_caf_send_to_remote): New function for sending data to
coarray on a remote image.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/send_char_array_1.f90: Extend test to
catch more cases.
* gfortran.dg/coarray_42.f90: Invert tests use, because no
longer a send is needed when local memory in a coarray is
allocated.

[Bug fortran/107635] [Coarray] Allocatable components of types defined in module's interface are not handled correctly when used in coarrays.

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107635

--- Comment #11 from GCC Commits  ---
The master branch has been updated by Andre Vehreschild :

https://gcc.gnu.org/g:d3244675441faf9c2d3949821f7deee34705e9c8

commit r15-7644-gd3244675441faf9c2d3949821f7deee34705e9c8
Author: Andre Vehreschild 
Date:   Fri Feb 7 12:09:53 2025 +0100

Fortran: Remove deprecated coarray routines [PR107635]

gcc/fortran/ChangeLog:

PR fortran/107635

* gfortran.texi: Remove deprecated functions from documentation.
* trans-decl.cc (gfc_build_builtin_function_decls): Remove
decprecated function decls.
* trans-intrinsic.cc (gfc_conv_intrinsic_exponent): Remove
deprecated/no longer needed routines.
* trans.h: Remove unused decls.

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_get): Removed because deprecated.
(_gfortran_caf_send): Same.
(_gfortran_caf_sendget): Same.
(_gfortran_caf_send_by_ref): Same.
* caf/single.c (assign_char4_from_char1): Same.
(assign_char1_from_char4): Same.
(convert_type): Same.
(defined): Same.
(_gfortran_caf_get): Same.
(_gfortran_caf_send): Same.
(_gfortran_caf_sendget): Same.
(copy_data): Same.
(get_for_ref): Same.
(_gfortran_caf_get_by_ref): Same.
(send_by_ref): Same.
(_gfortran_caf_send_by_ref): Same.
(_gfortran_caf_sendget_by_ref): Same.

[Bug fortran/107635] [Coarray] Allocatable components of types defined in module's interface are not handled correctly when used in coarrays.

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107635

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Andre Vehreschild :

https://gcc.gnu.org/g:15847252648ede9d2ad9eea398b7b870f62a2b30

commit r15-7641-g15847252648ede9d2ad9eea398b7b870f62a2b30
Author: Andre Vehreschild 
Date:   Wed Jan 22 15:12:29 2025 +0100

Fortran: Add caf_is_present_on_remote. [PR107635]

Replace caf_is_present by caf_is_present_on_remote which is using a
dedicated callback for each object to test on the remote image.

gcc/fortran/ChangeLog:

PR fortran/107635

* coarray.cc (create_allocated_callback): Add creating remote
side procedure for checking allocation status of coarray.
(rewrite_caf_allocated): Rewrite ALLOCATED on coarray to use caf
routine.
(coindexed_expr_callback): Exempt caf_is_present_on_remote from
being rewritten again.
* gfortran.h (enum gfc_isym_id): Add caf_is_present_on_remote
id.
* gfortran.texi: Add documentation for caf_is_present_on_remote.
* intrinsic.cc (add_functions): Add caf_is_present_on_remote
symbol.
* trans-decl.cc (gfc_build_builtin_function_decls): Define
interface of caf_is_present_on_remote.
* trans-intrinsic.cc (gfc_conv_intrinsic_caf_is_present_remote):
Translate caf_is_present_on_remote.
(trans_caf_is_present): Remove.
(caf_this_image_ref): Remove.
(gfc_conv_allocated): Take out coarray treatment, because that
is rewritten to caf_is_present_on_remote now.
(gfc_conv_intrinsic_function): Handle caf_is_present_on_remote
calls.
* trans.h: Add symbol for caf_is_present_on_remote and remove
old one.

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_is_present_on_remote): Add new
function.
(_gfortran_caf_is_present): Remove deprecated one.
* caf/single.c (struct accessor_hash_t): Add function ptr access
for remote side call.
(_gfortran_caf_is_present_on_remote): Added.
(_gfortran_caf_is_present): Removed.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/coarray_allocated.f90: Adapt to new method
of checking on remote image.
* gfortran.dg/coarray_lib_alloc_4.f90: Same.

[Bug fortran/107635] [Coarray] Allocatable components of types defined in module's interface are not handled correctly when used in coarrays.

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107635

--- Comment #5 from GCC Commits  ---
The master branch has been updated by Andre Vehreschild :

https://gcc.gnu.org/g:90ba8291c31f2cfb6a8c7bf0c0d6a9d93bbbacc9

commit r15-7638-g90ba8291c31f2cfb6a8c7bf0c0d6a9d93bbbacc9
Author: Andre Vehreschild 
Date:   Wed Jan 8 12:33:27 2025 +0100

Fortran: Move caf_get-rewrite to coarray.cc [PR107635]

Add a rewriter to keep all expression tree that is not optimization
together.  At the moment this is just a move from resolve.cc, but will
be extended to handle more cases where rewriting the expression tree may
be easier.  The first use case is to extract accessors for coarray
remote image data access.

gcc/fortran/ChangeLog:

PR fortran/107635
* Make-lang.in: Add coarray.cc.
* coarray.cc: New file.
* gfortran.h (gfc_coarray_rewrite): New procedure.
* parse.cc (rewrite_expr_tree): Add entrypoint for rewriting
expression trees.
* resolve.cc (gfc_resolve_ref): Remove caf_lhs handling.
(get_arrayspec_from_expr): Moved to rewrite.cc.
(remove_coarray_from_derived_type): Same.
(convert_coarray_class_to_derived_type): Same.
(split_expr_at_caf_ref): Same.
(check_add_new_component): Same.
(create_get_parameter_type): Same.
(create_get_callback): Same.
(add_caf_get_intrinsic): Same.
(resolve_variable): Remove caf_lhs handling.

libgfortran/ChangeLog:

* caf/single.c (_gfortran_caf_finalize): Free memory preventing
leaks.
(_gfortran_caf_get_by_ct): Fix constness.
* caf/libcaf.h (_gfortran_caf_register_accessor): Fix constness.

[Bug fortran/107635] [Coarray] Allocatable components of types defined in module's interface are not handled correctly when used in coarrays.

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107635

--- Comment #7 from GCC Commits  ---
The master branch has been updated by Andre Vehreschild :

https://gcc.gnu.org/g:abbfeb2ecbb5e90aa5d68e489ac283348ee6b8d5

commit r15-7640-gabbfeb2ecbb5e90aa5d68e489ac283348ee6b8d5
Author: Andre Vehreschild 
Date:   Wed Jan 22 13:36:21 2025 +0100

Fortran: Allow to use non-pure/non-elemental functions in coarray indexes
[PR107635]

Extract calls to non-pure or non-elemental functions from index
expressions on a coarray.

gcc/fortran/ChangeLog:

PR fortran/107635

* coarray.cc (get_arrayspec_from_expr): Treat array result of
function calls correctly.
(remove_coarray_from_derived_type): Prevent memory loss.
(add_caf_get_from_remote): Correct locus.
(find_comp): New function to find or create a new component in a
derived type.
(check_add_new_comp_handle_array): Handle allocatable arrays or
non-pure/non-elemental functions in indexes of coarrays.
(check_add_new_component): Use above function.
(create_get_parameter_type): Rename to
create_caf_add_data_parameter_type.
(create_caf_add_data_parameter_type): Renaming of variable and
make the additional data a coarray.
(remove_caf_ref): Factor out to reuse in other caf-functions.
(create_get_callback): Use function factored out, set locus
correctly and ensure a kind is set for parameters.
(add_caf_get_intrinsic): Rename to add_caf_get_from_remote and
rename some variables.
(coindexed_expr_callback): Skip over function created by the
rewriter.
(coindexed_code_callback): Filter some intrinsics not to
process.
(gfc_coarray_rewrite): Rewrite also contained functions.
* trans-intrinsic.cc (gfc_conv_intrinsic_caf_get): Reflect
changed order on caf_get_from_remote ().

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_register_accessor): Reflect
changed parameter order.
* caf/single.c (struct accessor_hash_t): Same.
(_gfortran_caf_register_accessor): Call accessor using a token
for accessing arrays with a descriptor on the source side.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_lib_comm_1.f90: Adapt scan expression.
* gfortran.dg/coarray/get_with_fn_parameter.f90: New test.
* gfortran.dg/coarray/get_with_scalar_fn.f90: New test.

[Bug fortran/107635] [Coarray] Allocatable components of types defined in module's interface are not handled correctly when used in coarrays.

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107635

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Andre Vehreschild :

https://gcc.gnu.org/g:8bf0ee8d62b8a08e808344d31354ab713157e15d

commit r15-7643-g8bf0ee8d62b8a08e808344d31354ab713157e15d
Author: Andre Vehreschild 
Date:   Fri Feb 7 11:25:31 2025 +0100

Fortran: Add transfer_between_remotes [PR107635]

Add the last missing coarray data manipulation routine using remote
accessors.

gcc/fortran/ChangeLog:

PR fortran/107635

* coarray.cc (rewrite_caf_send): Rewrite to
transfer_between_remotes when both sides of the assignment have
a coarray.
(coindexed_code_callback): Prevent duplicate rewrite.
* gfortran.texi: Add documentation for transfer_between_remotes.
* intrinsic.cc (add_subroutines): Add intrinsic symbol for
caf_sendget to allow easy rewrite to transfer_between_remotes.
* trans-decl.cc (gfc_build_builtin_function_decls): Add
prototype for transfer_between_remotes.
* trans-intrinsic.cc (conv_caf_vector_subscript_elem): Mark as
deprecated.
(conv_caf_vector_subscript): Same.
(compute_component_offset): Same.
(conv_expr_ref_to_caf_ref): Same.
(conv_stat_and_team): Extract stat and team from expr.
(gfc_conv_intrinsic_caf_get): Use conv_stat_and_team.
(conv_caf_send_to_remote): Same.
(has_ref_after_cafref): Mark as deprecated.
(conv_caf_sendget): Translate to transfer_between_remotes.
* trans.h: Add prototype for transfer_between_remotes.

libgfortran/ChangeLog:

* caf/libcaf.h: Add prototype for transfer_between_remotes.
* caf/single.c: Implement transfer_between_remotes.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_lib_comm_1.f90: Fix up scan_trees.

[Bug bootstrap/118802] [15 regression] Bootstrap comparison failure on libphobos/libdruntime/core/internal/gc/impl/conservative/gc.o

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118802

Sam James  changed:

   What|Removed |Added

  Component|other   |bootstrap

--- Comment #15 from Sam James  ---
Reproduced manually on another machine (phew). Reducing the options needed
first in the script. Will try your trick next to debug.

[Bug fortran/107635] [Coarray] Allocatable components of types defined in module's interface are not handled correctly when used in coarrays.

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107635

--- Comment #6 from GCC Commits  ---
The master branch has been updated by Andre Vehreschild :

https://gcc.gnu.org/g:b114312bbaae51567bc0436d07990c4fbaa3c81d

commit r15-7639-gb114312bbaae51567bc0436d07990c4fbaa3c81d
Author: Andre Vehreschild 
Date:   Wed Jan 8 12:33:36 2025 +0100

Fortran: Prepare for more caf-rework. [PR107635]

Factor out generation of code to get remote function index and to
create the additional data structure.  Rename caf_get_by_ct to
caf_get_from_remote.

gcc/fortran/ChangeLog:

PR fortran/107635

* gfortran.texi: Rename caf_get_by_ct to caf_get_from_remote.
* trans-decl.cc (gfc_build_builtin_function_decls): Rename
intrinsic.
* trans-intrinsic.cc (conv_caf_func_index): Factor out
functionality to be reused by other caf-functions.
(conv_caf_add_call_data): Same.
(gfc_conv_intrinsic_caf_get): Use functions factored out.
* trans.h: Rename intrinsic symbol.

libgfortran/ChangeLog:

* caf/libcaf.h (_gfortran_caf_get_by_ref): Remove from ABI.
This function is replaced by caf_get_from_remote ().
(_gfortran_caf_get_remote_function_index): Use better name.
* caf/single.c (_gfortran_caf_finalize): Free internal data.
(_gfortran_caf_get_by_ref): Remove from public interface, but
keep it, because it is still used by sendget ().

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray_lib_comm_1.f90: Adapt to renamed ABI
function.
* gfortran.dg/coarray_stat_function.f90: Same.
* gfortran.dg/coindexed_1.f90: Same.

[Bug ipa/118318] [15 regression] ICE when building firefox-134.0 with PGO

2025-02-20 Thread hubicka at ucw dot cz via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118318

--- Comment #15 from Jan Hubicka  ---
> Breakpoint 5.2, profile_count::operator+= (this=0x76e7e888, other=...) at
> /usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/profile-count.h:932
> 932   gcc_checking_assert (compatible_p (other));
> (gdb) p other
> $38 = (const profile_count &) @0x7fff72a0: {
>   static n_bits = 61,
>   static max_count = 2305843009213693950,
>   static uninitialized_count = 2305843009213693951,
>   m_val = 3694,
>   m_quality = ADJUSTED
> }
> (gdb) p *this
> $39 = {
>   static n_bits = 61,
>   static max_count = 2305843009213693950,
>   static uninitialized_count = 2305843009213693951,
>   m_val = 14776,
>   m_quality = GUESSED_GLOBAL0
> }
> (gdb)

Thanks a lot! So what happened is that we cloned function and conlcuded
it executes 0 times, while we want to make call inside that function to
execute 3694 times.I don't think we should clone function we think
will be executed 0 times.   What probably can happen is that we are
updating count of the non-specialized (original function) and this comes
out as an roundoff error...

Honza
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug target/109093] [15 regression] csmith: a February runtime bug ?

2025-02-20 Thread hjl.tools at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109093

H.J. Lu  changed:

   What|Removed |Added

  Attachment #60462|0   |1
is obsolete||

--- Comment #42 from H.J. Lu  ---
Created attachment 60539
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60539&action=edit
A patch for GCC 16

[Bug target/109780] [12/13/14/15 Regression] csmith: runtime crash with -O2 -march=znver1

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780

Sam James  changed:

   What|Removed |Added

  Known to work|15.0|
Summary|[12/13/14 Regression]   |[12/13/14/15 Regression]
   |csmith: runtime crash with  |csmith: runtime crash with
   |-O2 -march=znver1   |-O2 -march=znver1
 Depends on||118936

--- Comment #43 from Sam James  ---
Patch was reverted in r15-7634-g0312d11be3f666 and r15-7635-g6921c93d205203 to
try again in GCC 15.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118936
[Bug 118936] [15 Regression] ICE in ix86_finalize_stack_frame_flags, at
config/i386/i386.cc:8683

[Bug target/109093] [15 regression] csmith: a February runtime bug ?

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109093

Sam James  changed:

   What|Removed |Added

 Status|RESOLVED|NEW
 Depends on||118936
 Resolution|FIXED   |---
   Keywords|needs-bisection |

--- Comment #40 from Sam James  ---
Patch was reverted in r15-7634-g0312d11be3f666 and r15-7635-g6921c93d205203 to
try again in GCC 15.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118936
[Bug 118936] [15 Regression] ICE in ix86_finalize_stack_frame_flags, at
config/i386/i386.cc:8683

[Bug target/109093] [15 regression] csmith: a February runtime bug ?

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109093

Sam James  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |hjl.tools at gmail dot 
com

[Bug sarif-replay/96032] RFE: add a way to use output from --fdiagnostics-format=json or sarif as input

2025-02-20 Thread kdudka at redhat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96032

--- Comment #12 from Kamil Dudka  ---
I confirm that sarif-replay is available on f42+ and it seems to work as
expected.  Thanks!

[Bug target/118936] [15 Regression] ICE in ix86_finalize_stack_frame_flags, at config/i386/i386.cc:8683

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118936

Sam James  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |hjl.tools at gmail dot 
com
 Status|NEW |RESOLVED

--- Comment #14 from Sam James  ---
.

[Bug target/109093] [15 regression] csmith: a February runtime bug ?

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109093

Sam James  changed:

   What|Removed |Added

   Priority|P1  |P2

--- Comment #41 from Sam James  ---
I think we want to make it P2 instead for now. It's the same issue that goes
back way further than trunk, just this testcase is 15-only.

[Bug libstdc++/98749] No precondition checks in , and

2025-02-20 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98749

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|15.0|16.0

[Bug libstdc++/118395] Constructor of std::barrier is not constexpr

2025-02-20 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118395

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |16.0

[Bug target/94173] Superfluous stackpointer manipulation

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94173

Sam James  changed:

   What|Removed |Added

 CC||blubban at gmail dot com

--- Comment #6 from Sam James  ---
*** Bug 118946 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/118946] Missed optimization: GCC reserves stack space for optimized-out variable

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118946

--- Comment #3 from Sam James  ---


*** This bug has been marked as a duplicate of bug 94173 ***

[Bug target/118540] RISC-V: ICE for unsupported target attribute

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118540

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug target/118057] RISC-V: Can't vectorize load and store with zvl128b

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057

--- Comment #8 from Jeffrey A. Law  ---
This is really a costing issue.

Some designs (such as Ventana's) strided access can be very profitable,
particularly for a relatively small stride.  On others it may be considerably
worse.

Point being someone will have to build a cost model for each design to describe
the costs in a sane manner.

[Bug tree-optimization/117634] memset/struct copy transformation into memset/memset is not done if not address

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117634

Andrew Pinski  changed:

   What|Removed |Added

 CC||blubban at gmail dot com

--- Comment #4 from Andrew Pinski  ---
*** Bug 118946 has been marked as a duplicate of this bug. ***

[Bug target/116662] The value of __GCC_DESTRUCTIVE_SIZE for riscv64 could be improved

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116662

Jeffrey A. Law  changed:

   What|Removed |Added

   Last reconfirmed||2025-02-21
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

[Bug target/118734] RISC-V: Vector broadcast via strided load.

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118734

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2025-02-20
 Ever confirmed|0   |1

[Bug tree-optimization/118954] [15 regression] Miscompile at -O3 since r15-1757-g4d24159a1fcb15

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118954

Sam James  changed:

   What|Removed |Added

Summary|[15 regression] Miscompile  |[15 regression] Miscompile
   |at -O3  |at -O3 since
   ||r15-1757-g4d24159a1fcb15
   Keywords|needs-bisection |

--- Comment #10 from Sam James  ---
OK, it is r15-1757-g4d24159a1fcb15 with dumps too.

[Bug bootstrap/118802] [15 regression] Bootstrap comparison failure on libphobos/libdruntime/core/internal/gc/impl/conservative/gc.o since r15-7400-gd3ff498c478ace

2025-02-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118802

--- Comment #22 from Hongtao Liu  ---
(In reply to Sam James from comment #16)
> Bisected to r15-7400-gd3ff498c478ace (not CCing anyone yet as not enough
> useful information).

There's a new patch in [1] which will revert the commit and may fix it(or make
it latent again).

[1] https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675714.html

[Bug c/118963] New: Miscompile at -O2/3

2025-02-20 Thread yunboni at smail dot nju.edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118963

Bug ID: 118963
   Summary: Miscompile at -O2/3
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yunboni at smail dot nju.edu.cn
  Target Milestone: ---

This code times out at -O2/3 and prints 0 at -O0/1/s:

```c
int printf(const char *, ...);
int a = -104, b, c, e;
void g(int h) {
  int f = 0;
  while (!f + a - -104) {
f = h == 0;
if (f)
  h = 1;
  }
}
int main() {
  int d = 8;
  for (; e;)
d = 0;
  c = d;
  g(81 - 81);
  printf("%X\n", b);
}
```

Compiler Explorer: https://godbolt.org/z/q3WajnYPj 

Bisected to
https://github.com/gcc-mirror/gcc/commit/429a7a88438cc80e7c58d9f63d44838089899b12

[Bug bootstrap/118802] [15 regression] Bootstrap comparison failure on libphobos/libdruntime/core/internal/gc/impl/conservative/gc.o since r15-7400-gd3ff498c478ace

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118802

--- Comment #21 from Sam James  ---
I understand, thanks. I'll keep whittling it down.

[Bug target/117544] Lack of vsetvli after function call for whole register move

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544

--- Comment #4 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug target/118934] [15 Regression] RISC-V: ICE: output_operand: invalid expression as operand

2025-02-20 Thread anton at ozlabs dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118934

--- Comment #2 from Anton Blanchard  ---
This reproduces the issue. Build without optimisation to avoid all the code
disappearing:

#define INSNS_1 x = x + 1;
#define INSNS_2 INSNS_1 INSNS_1
#define INSNS_4 INSNS_2 INSNS_2
#define INSNS_8 INSNS_4 INSNS_4
#define INSNS_16 INSNS_8 INSNS_8
#define INSNS_32 INSNS_16 INSNS_16
#define INSNS_64 INSNS_32 INSNS_32
#define INSNS_128 INSNS_64 INSNS_64
#define INSNS_256 INSNS_128 INSNS_128
#define INSNS_512 INSNS_256 INSNS_256
#define INSNS_1024 INSNS_512 INSNS_512
#define INSNS_2048 INSNS_1024 INSNS_1024
#define INSNS_4096 INSNS_2048 INSNS_2048
#define INSNS_8192 INSNS_4096 INSNS_4096
#define INSNS_16384 INSNS_8192 INSNS_8192
#define INSNS_32768 INSNS_16384 INSNS_16384
#define INSNS_65536 INSNS_32768 INSNS_32768
#define INSNS_131072 INSNS_65536 INSNS_65536
#define INSNS_262144 INSNS_131072 INSNS_131072
#define INSNS_524288 INSNS_262144 INSNS_262144
#define INSNS_1048576 INSNS_524288 INSNS_524288

int foo(int x)
{
  if (x)
goto out;

  // > 1MB of code
  INSNS_524288

out:
  return x;
}

# riscv64-unknown-linux-gnu-gcc -c large-function.c
during RTL pass: final
large-function.c: In function 'foo':
large-function.c:33:1: internal compiler error: output_operand: invalid
expression as operand

[Bug target/80878] -mcx16 (enable 128 bit CAS) on x86_64 seems not to work on 7.1.0

2025-02-20 Thread lh_mouse at 126 dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

--- Comment #49 from LIU Hao  ---
(In reply to Luke Dalessandro from comment #48)
> So my understanding is that 104688 basically determined that it's correct to
> implement atomic load with movdqa for aligned addresses on architectures
> with AVX support. And hence gcc could inline that in the same way clang
> does, and inline cmpxchg16b for
> compare_exchange/__atomic_compare_exchange{_n} as well. And thus there no
> longer has to be a libatomic call for any of these.

Yes. However I suspect it might be an ABI break.


> I can support the fact that -mcx16 is maybe the wrong flag to use to force
> inlining here given it's cmpxchg-style name, but it really feels like a
> sophisticated user that's willing to live in implementation-defined land
> should be able to get the same performance for lock-free code out of gcc
> that it does out of clang in this situation.

May I remind you about https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878#c42 ?

First CMPXCHG16B can be much slower than CMPXCHG:
https://quick-bench.com/q/MZioNHkbBn0soH_KSDyYcKmrrxU

Second not all x86-64 processors support CMPXCHG16B, so `-mcx16` is required,
like `-mavx`.

[Bug target/117544] Lack of vsetvli after function call for whole register move

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117544

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Jeffrey A. Law  ---
Forgot to change state to closed...

[Bug fortran/118932] Testcase gfortran.dg/binding_label_tests_34.f90 needs standard checking

2025-02-20 Thread tkoenig at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118932

--- Comment #4 from Thomas Koenig  ---
Hm, maybe I am misunderstanding the standard here, or it says something
that was not intentional...

We accept

program memain
  interface
 subroutine lower () bind(c,name="foo")
 end subroutine lower
 subroutine upper () bind(c,name="FOO")
 end subroutine upper
end interface
call lower
call upper
end program memain

but probably due to error rather than design, as -fdump-fortran-global
shows:

name=FOO
name=foo, sym_name=upper, binding_label=FOO
name=memain

[Bug target/94173] Superfluous stackpointer manipulation

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94173

Jeffrey A. Law  changed:

   What|Removed |Added

Summary|[RISCV] Superfluous |Superfluous stackpointer
   |stackpointer manipulation   |manipulation
 Target|riscv   |

--- Comment #5 from Jeffrey A. Law  ---
Making this more generic as it's not specific to any target.

[Bug tree-optimization/118947] Missed optimization: GCC forgets stack buffer contents across function call

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118947

Andrew Pinski  changed:

   What|Removed |Added

  Component|rtl-optimization|tree-optimization

--- Comment #3 from Andrew Pinski  ---
Oh optimize_memcpy_to_memset does not understand "" either. I will add that
support too.


Here is a testcase to show the vdef issue and not related to the "" issue:
```
void* aaa();
void* bbb()
{
aaa();
static int buf2[32];
int buf[32] = {};
__builtin_memcpy(buf2, buf, sizeof(buf2));
return buf2;
}
void* ccc()
{
int buf[32] = {};
aaa();
static int buf2[32];
__builtin_memcpy(buf2, buf,  sizeof(buf2));
return buf2;
}
```

[Bug analyzer/94713] Analyzer is buggy on uninitialized pointer

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94713

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||blubban at gmail dot com

--- Comment #5 from Jeffrey A. Law  ---
*** Bug 118946 has been marked as a duplicate of this bug. ***

[Bug rtl-optimization/118946] Missed optimization: GCC reserves stack space for optimized-out variable

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118946

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Jeffrey A. Law  ---

Marking as a duplicate of one I happen to know about.  I suspect there are
others.

*** This bug has been marked as a duplicate of bug 94713 ***

[Bug target/118950] [14/15 regression] RISC-V: rv64gcv runtime mismatch at -O3 since r14-4038-gb975c0dc3be

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118950

Jeffrey A. Law  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2025-02-20
 Status|UNCONFIRMED |NEW

[Bug target/118945] RISC-V: VSETL pass: Don't promote Vectors ops from Tail agnostic to Tail Undisturbed

2025-02-20 Thread vineetg at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945

--- Comment #10 from Vineet Gupta  ---
(In reply to JuzheZhong from comment #9)
> 
> I think we should consider many more different situation and consider it
> carefully. Like:
> 
> vsetvli ... e8,mf8 ta ma (demand ratio)
> ...
> vservli zero zero e32 mf2 tu ma (demand ratio)
> ...
> vservli zero zero e64 m1 ta ma (demand SEW and LMUL)
> ...
> vservli zero zero e64 m1 ta mu (demand ratio)
> ...
> vservli zero zero e16 mf4 tu mu(demand ratio)
> ...
> vservli zero zero e32 mf2 ta ma(demand ratio)
> ...
> vservli zero zero e8 mf8 ta ma(demand ratio)
> 
> In current strategy, 7 "vsetvli" will be fused into 1 single "vsetvli":
> 
> vservli ... e64 m1 tu mu
> 
> However, if you just keep agnostic not allow to fuse it, you will end up
> with 6 more "vsetvli"s. I don't think this codegen can better in any
> micro-architecture design.

While the orig test was too simple and contrived, this is too complex and
contrived :-)  I'd argue that if there's such toggling of tail and mask
policies then yeah its fine to have so many vsetvls.

We all agree this will be a cpu tune to retain the existing behavior while
providing new behavior as opt-in for uarches that deem fit.

[Bug target/117955] GCC generate illegal riscv instruction `vsetvli zero,zero,e64,mf4,ta,ma` with -O2 and -O3

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117955

--- Comment #6 from Jeffrey A. Law  ---
As I feared, this has just gone latent.  If you revert:

bdbbe5d4b6d495ac06ee762540a1277498f2a7a0
7bef3482f27ce13ba7e6c4f43943f28a49e63a40

This can be triggered again on the trunk.  Given the sensitivity to scheduling
changes I suspect it's ultimately a vsetvl optimization issue.

[Bug tree-optimization/14295] [tree-ssa] copy propagation for aggregates

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14295

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #11 from Andrew Pinski  ---
.

[Bug tree-optimization/14295] [tree-ssa] copy propagation for aggregates

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14295

--- Comment #12 from Andrew Pinski  ---
optimize_memcpy_to_memset does some simple copy prop but with zeroing. A
similar method could be done for non zeroing and i am going to try that.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #16 from Sam James  ---
Created attachment 60552
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60552&action=edit
emacs.log.xz

So far, not got anywhere with attempting to copy our packaging into a script.

I've attached a build log from building Emacs from git (just ./autogen.sh &&
./configure && make V=1 -j$(nproc) -l$(nproc)) using Gentoo's GCC in case you
can spot some difference with your own.

I'm going to see if I can reproduce in a Docker container using Gentoo's GCC
and go from there.

[Bug target/115763] RISC-V: Use wrong SEW for vfmv.v.f when -march only has zvfhmin

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115763

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Jeffrey A. Law  ---
Per c#4, c#8, c#10.

[Bug rtl-optimization/115523] [avr] Remove SFmode insns

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115523

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #7 from Jeffrey A. Law  ---
Per c#6.

[Bug target/115795] RISC-V: vsetvl step causes wrong codegen after fusing info

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115795

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Jeffrey A. Law  ---
Per c#8.

[Bug target/114809] [RISC-V RVV] Counting elements might be simpler

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809

--- Comment #4 from Jeffrey A. Law  ---
I fixed the missed peephole a while back.  But the question about cpop vs other
strategies remains.

[Bug target/118931] [15 Regression] RISC-V: rv64gcv miscompile at -O[23] since r15-3228-g771256bcb9d

2025-02-20 Thread pan2.li at intel dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118931

--- Comment #2 from Li Pan  ---
  13   │ int main ()
  14   │ {
  15   │   vector(16) unsigned char vect__3.5;
  16   │   unsigned char a_lsm.2;
  17   │   long long int _5;
  18   │   vector(16) unsigned char _13;
  19   │   unsigned char _29;
  20   │
  21   │[local count: 71618576]:
  22   │   a_lsm.2_20 = a;
  23   │   _13 = {a_lsm.2_20, a_lsm.2_20, a_lsm.2_20, a_lsm.2_20, a_lsm.2_20,
a_lsm.2_20, a_lsm.2_20, a_lsm.2_20, a_lsm.2_20, a_lsm.2_20, a_lsm.2_20,
a_lsm.2_20, a_lsm.2_20, a_lsm.2_20, a_lsm.2_20, a_lsm.2_20};
  24   │   vect__3.5_25 = _13 * { 151, 17, 7, 33, 119, 49, 231, 65, 87, 81,
199, 97, 55, 113, 167, 129 };
  25   │   _29 = .VEC_EXTRACT (vect__3.5_25, 13);
  26   │   a = _29;
  27   │   _5 = (long long int) _29;
  28   │   __builtin_printf ("%llu\n", _5);
  29   │   return 0;
  30   │
  31   │ }

It is correct from the tree-optimized, (unsigned char )(109 * 113) = 29 is what
we expect.  Should be a backend issue.

[Bug target/80878] -mcx16 (enable 128 bit CAS) on x86_64 seems not to work on 7.1.0

2025-02-20 Thread ldalessandro at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

--- Comment #48 from Luke Dalessandro  ---

(In reply to LIU Hao from comment #47)
> (In reply to Luke Dalessandro from comment #46)
> > But if 104688 isn't related to this issue, and thus Jakub's comment was in
> > error, I definitely don't understand the underlying problem and why clang is
> > fine doing it.
> 
> Issue here is that if atomic load is implemented with a call to libatomic
> routines then it's incorrect to implement CAS without a call.

So my understanding is that 104688 basically determined that it's correct to
implement atomic load with movdqa for aligned addresses on architectures with
AVX support. And hence gcc could inline that in the same way clang does, and
inline cmpxchg16b for compare_exchange/__atomic_compare_exchange{_n} as well.
And thus there no longer has to be a libatomic call for any of these.

I can support the fact that -mcx16 is maybe the wrong flag to use to force
inlining here given it's cmpxchg-style name, but it really feels like a
sophisticated user that's willing to live in implementation-defined land should
be able to get the same performance for lock-free code out of gcc that it does
out of clang in this situation.

[Bug target/118931] [15 Regression] RISC-V: rv64gcv miscompile at -O[23] since r15-3228-g771256bcb9d

2025-02-20 Thread pan2.li at intel dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118931

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #1 from Li Pan  ---
Reproduced from upstream with "-mrvv-vector-bits=zvl", will take a look.

[Bug target/113715] RISC-V: If the Zcmp is enabled, the a0 register operates abnormally when the program returns

2025-02-20 Thread law at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113715

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #9 from Jeffrey A. Law  ---
Fixed on the trunk.  No plans for further backports.

[Bug middle-end/23782] SRA pessimizes passing structures by value at -Os (+22% code size)

2025-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23782

--- Comment #9 from Andrew Pinski  ---
I have a patch which builds on top of PR 14295 which improves the situtation
here. It has a few testcase regressions but those are testcase issues which I
will fix up later on.

[Bug target/118950] [14/15 regression] RISC-V: rv64gcv runtime mismatch at -O3 since r14-4038-gb975c0dc3be

2025-02-20 Thread rdapp at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118950

--- Comment #5 from Robin Dapp  ---
Yeah, the original statement is recognized as a mask conversion pattern:

pr118950.c:9:21: note:   vect_recog_mask_conversion_pattern: detected: _152 =
.MASK_LOAD (_230, 8B, _229, 0);
pr118950.c:9:21: note:   mask_conversion pattern recognized: patt_355 =
.MASK_LOAD (_230, 8B, patt_54, 0);

but also as a scatter/gather:

pr118950.c:9:21: note:   gather/scatter pattern: detected: _152 = .MASK_LOAD
(_230, 8B, _229, 0);
pr118950.c:9:21: note:   gather_scatter pattern recognized: patt_375 =
.MASK_LEN_GATHER_LOAD ((sizetype) _215 + 20, _85, 1, 0, _229, 0);

The type of _152 is _Bool but patt_375's type is unsigned char.  With unsigned
char the presence of padding bits is not obvious and we should have looked at
_152's type.

[Bug tree-optimization/118954] [15 regression] Miscompile at -O3 since r15-1757-g4d24159a1fcb15

2025-02-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118954

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #11 from Richard Biener  ---
I will have a look.

[Bug c/118953] New: Miscompile at -O3

2025-02-20 Thread yunboni at smail dot nju.edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118953

Bug ID: 118953
   Summary: Miscompile at -O3
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yunboni at smail dot nju.edu.cn
  Target Milestone: ---

This code prints 45 at -O3 and 0 at -O0/1/2/s:

int printf(const char *, ...);
int a, d;
long b, c;
int e(int f, int g, unsigned long h, long j) {
  unsigned long i = 0;
  if (g)
switch (f) {
case 8:
  i = b;
  break;
case 6:
  i = c;
}
  else
switch (f) {
case 8:
  i = h;
  break;
case 24:
case 32:
  i = j;
}
  return i;
}
int main() {
  int k = a * (409628 - 28);
  d = e(k - 1048524, 0, k - 1048487, (unsigned long)k - 1048531);
  printf("%d\n", d);
}

Compiler Explorer: https://godbolt.org/z/YTWjbWe3s

Bisected to
https://github.com/gcc-mirror/gcc/commit/602e824eec30a7c6792b8b27d61c40f1c1a2714c

[Bug libstdc++/118494] std::counting_semaphore should work

2025-02-20 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118494

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |16.0

[Bug libstdc++/99552] FAIL: 29_atomics/atomic/wait_notify/bool.cc (test for excess errors)

2025-02-20 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99552

Jonathan Wakely  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=115955
   Target Milestone|--- |16.0

[Bug libstdc++/110854] constructor of std::counting_semaphore is not constexpr

2025-02-20 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110854

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|15.0|16.0

[Bug tree-optimization/118954] [15 regression] Miscompile at -O3

2025-02-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118954

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
I can't reproduce, not even with exactly the godbolt revision.

[Bug tree-optimization/114999] A few missing optimizations due to `a - b` and `b - a` not being detected as negatives of each other

2025-02-20 Thread jschmitz at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114999

--- Comment #13 from Jennifer Schmitz  ---
Created attachment 60540
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60540&action=edit
Patch for improving codegen of absolute differences of unsigned integers in
aarch64

This patch builds on top of the previous one, improving codegen for the same
test cases for unsigned integers (32-bit and 64-bit) for aarch64. The patch
adds a new define_insn_and_split pattern in the aarch64 backend.

[Bug target/118844] Link failure caused by crtbeginS.o

2025-02-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118844

--- Comment #3 from GCC Commits  ---
The releases/gcc-14 branch has been updated by LuluCheng
:

https://gcc.gnu.org/g:9ffecde121af883b60bbe60d00425036bc873048

commit r14-11321-g9ffecde121af883b60bbe60d00425036bc873048
Author: Lulu Cheng 
Date:   Wed Feb 12 14:29:58 2025 +0800

LoongArch: Fix the issue of function jump out of range caused by
crtbeginS.o [PR118844].

Due to the presence of R_LARCH_B26 in
/usr/lib/gcc/loongarch64-linux-gnu/14/crtbeginS.o, its addressing
range is [PC-128MiB, PC+128MiB-4]. This means that when the code
segment size exceeds 128MB, linking with lld will definitely fail
(ld will not fail because the order of the two is different).

The linking order:
  lld: crtbeginS.o + .text + .plt
  ld : .plt + crtbeginS.o + .text

To solve this issue, add '-mcmodel=extreme' when compiling crtbeginS.o.

PR target/118844

libgcc/ChangeLog:

* config/loongarch/t-crtstuff: Add '-mcmodel=extreme'
to CRTSTUFF_T_CFLAGS_S.

(cherry picked from commit ae14d7d04da8c6cb542269722638071f999f94d8)

[Bug tree-optimization/118953] [14/15 regression] Miscompile at -O2 since r14-2473-g602e824eec30a7

2025-02-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118953

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Priority|P3  |P2
   Last reconfirmed||2025-02-20
 Status|UNCONFIRMED |NEW

--- Comment #3 from Richard Biener  ---
Confirmed.

  # RANGE [irange] int [-INF, +INF] MASK 0xc000 VALUE 0x34
  _7 = k_11 + -1048524;
  switch (_7)  [33.33%], case 8:  [33.33%], case 24: 
[33.33%], case 32:  [33.33%]>

(k_11 is zero at runtime).  EVRP then makes

Global Exported: _7 = [irange] int [-INF, 2146435123] MASK 0xc000 VALUE
0x34
Global Exported: i_20 = [irange] long unsigned int [45, 45] MASK
0xc07d VALUE 0x0

  # RANGE [irange] int [-INF, 2146435123] MASK 0xc000 VALUE 0x34
  _7 = k_11 + -1048524;
  d = 45;

out of this.

[Bug c++/118951] FILE inserts the filename as array, __builtin_FILE as pointer

2025-02-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118951

--- Comment #1 from Richard Biener  ---
We can't change the signature of builtins.  Also there's nothing like an array
return value for functions in C or C++?

[Bug tree-optimization/118954] [15 regression] Miscompile at -O3

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118954

--- Comment #3 from Sam James  ---
I can (with -fno-ssp), so bisecting.

[Bug tree-optimization/118953] [14/15 regression] Miscompile at -O2 since r14-2473-g602e824eec30a7

2025-02-20 Thread yunboni at smail dot nju.edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118953

--- Comment #2 from Yunbo Ni  ---
(In reply to Sam James from comment #1)
> I get '45' at -O2 and -O3 locally and on godbolt, but -O1 shows 0.

Yes, you're right. I mistakenly wrote the result from the case before it was
reduced. Sorry about that.

[Bug tree-optimization/118953] [14/15 regression] Miscompile at -O2 since r14-2473-g602e824eec30a7

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118953

Sam James  changed:

   What|Removed |Added

Summary|[14/15 regression]  |[14/15 regression]
   |Miscompile at -O3 since |Miscompile at -O2 since
   |r14-2473-g602e824eec30a7|r14-2473-g602e824eec30a7
   Target Milestone|--- |14.3

[Bug tree-optimization/118953] [14/15 regression] Miscompile at -O3 since r14-2473-g602e824eec30a7

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118953

Sam James  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org,
   ||amacleod at redhat dot com
Summary|Miscompile at -O3   |[14/15 regression]
   ||Miscompile at -O3 since
   ||r14-2473-g602e824eec30a7
   Keywords||wrong-code
  Component|c   |tree-optimization

--- Comment #1 from Sam James  ---
I get '45' at -O2 and -O3 locally and on godbolt, but -O1 shows 0.

> Bisected to
> https://github.com/gcc-mirror/gcc/commit/
> 602e824eec30a7c6792b8b27d61c40f1c1a2714c

r14-2473-g602e824eec30a7

[Bug target/118949] RISC-V: Extra FRM writes since GCC-14.2

2025-02-20 Thread pan2.li at intel dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118949

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #2 from Li Pan  ---
There is minor changes for FRM in mode-switch from gcc-14 to gcc-15, one
related change on FRM is adding it to global_reg.  Because gcc-15 introduced
late-combine will delete one necessary FRM back insn as it isn't live from the
entry.

For llvm, AFAIK when support round autovec in GCC, it may not support all cases
unless it has some optimization recently.

I will take a look if it related to FRM or something we can do here, and keep
you posted.

[Bug tree-optimization/118521] [15 regression] std::vector Wstringop-overflow false positive since r15-4473

2025-02-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118521

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #10 from Richard Biener  ---
(In reply to Richard Biener from comment #9)
[...] 
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 9fd85a35a63..c02fd0e186d 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -346,9 +346,10 @@ along with GCC; see the file COPYING3.  If not see
>   form if possible.  */
>NEXT_PASS (pass_thread_jumps, /*first=*/false);
>NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
> -  NEXT_PASS (pass_strlen);
>NEXT_PASS (pass_thread_jumps_full, /*first=*/false);
>NEXT_PASS (pass_vrp, true /* final_p */);
> +  NEXT_PASS (pass_forwprop, /*last=*/true);
> +  NEXT_PASS (pass_strlen);
>/* Run CCP to compute alignment and nonzero bits.  */
>NEXT_PASS (pass_ccp, true /* nonzero_p */);
>NEXT_PASS (pass_warn_restrict);
> @@ -356,7 +357,6 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_dce, true /* update_address_taken_p */, true /*
> remove_unused_locals */);
>/* After late DCE we rewrite no longer addressed locals into SSA
>  form if possible.  */
> -  NEXT_PASS (pass_forwprop, /*last=*/true);
>NEXT_PASS (pass_sink_code, true /* unsplit edges */);
>NEXT_PASS (pass_phiopt, false /* early_p */);
>NEXT_PASS (pass_fold_builtins);
>

Causes

+FAIL: c-c++-common/Wstringop-overflow.c  -std=gnu++17  (test for warnings,
line 93)
+FAIL: c-c++-common/Wstringop-overflow.c  -std=gnu++17  (test for warnings,
line 94)
...
+FAIL: gcc.dg/strlenopt-3.c scan-tree-dump-times optimized "return 0" 3
+FAIL: gcc.dg/strlenopt-45.c (test for excess errors)
+FAIL: gcc.dg/strlenopt-45.c scan-tree-dump-times optimized
"call_in_true_branch_not_eliminated_" 0
+FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "_not_eliminated_"
0
+FAIL: gcc.dg/strlenopt-70.c scan-tree-dump-times optimized "strlen" 0
+FAIL: gcc.dg/strlenopt-73.c scan-tree-dump-times optimized "_not_eliminated_"
0
+FAIL: gcc.dg/strlenopt-73.c scan-tree-dump-times optimized "strlen" 0
+FAIL: gcc.dg/strlenopt-77.c scan-tree-dump-times optimized
"call_in_true_branch_not_eliminated_" 0
+FAIL: gcc.dg/strlenopt-80.c (test for excess errors)
+FAIL: gcc.dg/strlenopt-80.c scan-tree-dump-times optimized "failure_on_line
(" 0
+FAIL: gcc.dg/strlenopt-91.c scan-tree-dump-not optimized "abort"
+FAIL: gcc.dg/tree-ssa/builtin-snprintf-3.c scan-tree-dump-not optimized
"failure_range"
+FAIL: gcc.dg/tree-ssa/builtin-snprintf-7.c scan-tree-dump-times optimized
"_not_eliminated" 0
+FAIL: gcc.dg/tree-ssa/builtin-snprintf-8.c scan-tree-dump-not optimized
"abort"
+FAIL: gcc.dg/tree-ssa/builtin-snprintf-9.c scan-tree-dump-not optimized
"abort"
+FAIL: gcc.dg/tree-ssa/builtin-sprintf-4.c scan-tree-dump-not optimized
"failure_on_line"
+FAIL: gcc.dg/tree-ssa/builtin-sprintf-9.c scan-tree-dump-times optimized
"call_in_true_branch_not_eliminated_" 0
+FAIL: gcc.dg/tree-ssa/builtin-sprintf.c (test for excess errors)
+UNRESOLVED: gcc.dg/tree-ssa/builtin-sprintf.c compilation failed to produce
executable
+FAIL: gcc.dg/tree-ssa/pr79327-2.c scan-tree-dump-not optimized
"failure_on_line"
+FAIL: gcc.dg/tree-ssa/pr83198.c scan-tree-dump-not optimized "link_error
();"


> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index 0ba85917d41..a0d1c2f3d86 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -284,6 +284,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-into-ssa.h"
>  #include "builtins.h"
>  #include "case-cfn-macros.h"
> +#include "tree-eh.h"
>  
>  static tree analyze_scalar_evolution_1 (class loop *, tree);
>  static tree analyze_scalar_evolution_for_address_of (class loop *loop,
> @@ -3947,6 +3948,19 @@ final_value_replacement_loop (class loop *loop)
>   print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (rslt), 0);
>   fprintf (dump_file, "\n");
> }
> +
> +  if (! SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt))
> +   {
> + gimple *use_stmt;
> + imm_use_iterator imm_iter;
> + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, rslt)
> +   {
> + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> + if (!stmt_can_throw_internal (cfun, use_stmt)
> + && fold_stmt (&gsi, follow_all_ssa_edges))
> +   update_stmt (gsi_stmt (gsi));
> +   }
> +   }
>  }
>  
>return any;
> 
> this should have the least chance of regressing things.  I'll report results.

This OTOH works fine, so posted for review.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #11 from Sam James  ---
(In reply to Richard Biener from comment #10)
> So how does one go to try reproducing this?  Does it show up when building
> emacs itself?

Yes. If you build Emacs with ./configure --with-native-compilation, it should
happen (it may need --with-native-compilation=aot in order to pre-compile more)
just on `make`. No need to run Emacs manually or install it.

1 2 3 >

1 - 100 of 209 matches

Mail list logo