[Bug c++/113111] New: -Werror=uninitialized is not consistent for optimization level 0 or -std=before-c++20

2023-12-22 Thread MikeSmith32564 at mail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113111

Bug ID: 113111
   Summary: -Werror=uninitialized is not consistent for
optimization level 0 or -std=before-c++20
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: MikeSmith32564 at mail dot com
  Target Milestone: ---

The following snippet:



#include 
#include 

struct Dummy
{
std::string val;
};

int main()
{
Dummy d =
{
d.val = "random text
#"
};

std::cout<::_M_string_length' is used
uninitialized [-Werror=uninitialized]
 1084 |   { return _M_string_length; }
  |^~~~

The example triggers undefined behavior when executed of course due to running
std::string::operator= with the this pointer pointing to uninitialized memory
but I believe a warning should be raised when building regardless of the
optimization level selected.

Strangely, standards before c++20 also do not report the warning.

[Bug testsuite/113005] 'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test timeouts

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113005

--- Comment #5 from Jakub Jelinek  ---
I can confirm that on x86_64-linux with 16 cores/32 threads even the -O0
rwlock_1 and rwlock_3 tests aren't that slow, byt with OMP_NUM_THREADS=1024
and higher rwlock_1 STOPs:
$ OMP_NUM_THREADS=256 LD_LIBRARY_PATH=../.libs/ time ./rwlock_1.exe
2.21user 8.88system 0:00.88elapsed 1260%CPU (0avgtext+0avgdata
47324maxresident)k
0inputs+206848outputs (0major+29321minor)pagefaults 0swaps
$ OMP_NUM_THREADS=512 LD_LIBRARY_PATH=../.libs/ time ./rwlock_1.exe
4.59user 14.41system 0:02.29elapsed 829%CPU (0avgtext+0avgdata
89464maxresident)k
0inputs+413696outputs (0major+55232minor)pagefaults 0swaps
$ OMP_NUM_THREADS=1024 LD_LIBRARY_PATH=../.libs/ time ./rwlock_1.exe
STOP 2
STOP 2
Command exited with non-zero status 2
0.04user 0.78system 0:00.14elapsed 586%CPU (0avgtext+0avgdata
18976maxresident)k
0inputs+1672outputs (2major+4138minor)pagefaults 0swaps
$ OMP_NUM_THREADS=1024 LD_LIBRARY_PATH=../.libs/ time ./rwlock_3.exe
13.57user 49.00system 0:17.66elapsed 354%CPU (0avgtext+0avgdata
26588maxresident)k
0inputs+0outputs (1major+5987minor)pagefaults 0swaps
$ OMP_NUM_THREADS=2048 LD_LIBRARY_PATH=../.libs/ time ./rwlock_3.exe
38.15user 134.83system 0:51.26elapsed 337%CPU (0avgtext+0avgdata
45860maxresident)k
0inputs+0outputs (0major+10844minor)pagefaults 0swaps

[Bug testsuite/113005] 'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test timeouts

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113005

--- Comment #6 from Jakub Jelinek  ---
But guess another case is on a loaded system.
When building on Fedora for distro builds on all of aarch64, powerpc64le, s390x
or x86_64 I see:
+FAIL: libgomp.fortran/rwlock_1.f90   -O0  execution test
+FAIL: libgomp.fortran/rwlock_1.f90   -O1  execution test
+FAIL: libgomp.fortran/rwlock_1.f90   -O2  execution test
+FAIL: libgomp.fortran/rwlock_1.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
+FAIL: libgomp.fortran/rwlock_1.f90   -O3 -g  execution test
+FAIL: libgomp.fortran/rwlock_1.f90   -Os  execution test
and in all those cases it was make -j{8,6,3,8} check.  Though, for number of
threads <= 8 we don't do the OMP_NUM_THREADS limiting and so it uses the
defaults.
Anyway, see e.g.
https://kojipkgs.fedoraproject.org//work/tasks/4298/110644298/build.log
and one can uudecode the make check logs from that file.

[Bug testsuite/113005] 'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test timeouts

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113005

--- Comment #7 from Jakub Jelinek  ---
wget .../build.log; sed -n /^begin/,/^end/p build.log | uudecode

[Bug testsuite/113005] 'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test timeouts

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113005

--- Comment #8 from Jakub Jelinek  ---
Execution timeout is: 300
spawn [open ...]
STOP 2
Internal Error: Trying to free nonempty asynchronous unit

Error termination. Backtrace:
Internal Error: Trying to free nonempty asynchronous unit

Error termination. Backtrace:
#0  0x7f986429289e in free_async_unit
at ../../../libgfortran/io/async.c:211
#1  0x7f9864288729 in close_unit_1
at ../../../libgfortran/io/unit.c:759
#2  0x401644 in ???
#3  0x7f986432cd7d in ???
#4  0x7f9863e87c90 in ???
#5  0x7f9863f07fbb in ???
#6  0x in ???
FAIL: libgomp.fortran/rwlock_1.f90   -O0  execution test
is what I see in the log.  Or just STOP 2.

[Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release

2023-12-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112

Bug ID: 113112
   Summary: RISC-V: Dynamic LMUL feature stabilization for GCC-14
release
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Created attachment 56922
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56922&action=edit
dynamic LMUL fail case

Hi, as we known that we have supported dynamic LMUL feature but not stable.

As far as I known, we only have these 2 execution FAILs:
FAIL: gcc.dg/pr30957-1.c execution test
FAIL: gcc.dg/signbit-5.c execution test
in full coverage testing. And they are not the real FAIL.
Tests need to be adjusted.

And I have tested on K230 and other hardware, turns out we will have over 30%
performance improvement (compare with default LMUL = M1) for various benchmark
if we can select reasonable big
LMUL (no additional registers spillings).

However, I also find that there are some benchmarks have significantly
performance
drop (compare with default LMUL = M1) when using dynamic LMUL.
I am pretty sure because we pick the wrong big LMUL (LMUL>1) which causes
additional register spillings then we have bad performance for such situations.

For example:

#include 

#define N 40

int a[N];

__attribute__ ((noinline)) int
foo (int n){
  int i,j;
  int sum,x;

  for (i = 0; i < n; i++) {
sum = 0;
for (j = 0; j < n; j++) {
  sum += (i + j);
}
a[i] = sum;
  }
  return 0;
}

-march=rv64gcv -mabi=lp64d -O3 --param riscv-autovec-lmul=dynamic --param
riscv-autovec-preference=fixed-vlmax

ASM:

foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addisp,sp,-128
addia2,a2,%lo(.LANCHOR0)
mv  a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v   v8
vs8r.v  v8,0(sp)
.L3:
vl8re32.v   v16,0(sp)
vsetvli a4,a1,e8,m2,ta,ma
li  a3,0
vsetvli a5,zero,e32,m8,ta,ma
vmv8r.v v0,v16
vmv.v.x v8,a4
vmv.v.i v24,0
vadd.vv v8,v16,v8
vmv8r.v v16,v24
vs8r.v  v8,0(sp)
.L4:
addiw   a3,a3,1
vadd.vv v8,v0,v16
vadd.vi v16,v16,1
vadd.vv v24,v24,v8
bne a0,a3,.L4
vsetvli zero,a4,e32,m8,ta,ma
sub a1,a1,a4
vse32.v v24,0(a2)
sllia4,a4,2
add a2,a2,a4
bne a1,zero,.L3
li  a0,0
addisp,sp,128
jr  ra
.L11:
li  a0,0
ret

As we can see, pick up LMUL = 8 then spills.

This case is found by this following code I add into mov pattern:

  if (known_gt (GET_MODE_SIZE (mode), BYTES_PER_RISCV_VECTOR)
  && riscv_autovec_lmul == RVV_DYNAMIC && lra_in_progress)
gcc_unreachable ();


The attachment is the file shows the cases that we pick up incorrect too big
LMUL which cause addiontial spillings.

I will work on this issue in the following days.

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #1 from Jakub Jelinek  ---
Any progress with the bisection?
Or at least details what exactly are you compiling (with what patches etc.)?

[Bug tree-optimization/110389] [12/13/14 Regression] wrong code at -Os and -O2 with "-fno-tree-ch -fno-expensive-optimizations -fno-ivopts -fno-tree-loop-ivcanon" on x86_64-linux-gnu

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110389

Jakub Jelinek  changed:

   What|Removed |Added

 CC||aldyh at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Started with r12-4526-gd8edfadfc7a9795b65177a50ce44fd348858e844

[Bug testsuite/113005] 'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test timeouts

2023-12-22 Thread lipeng.zhu at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113005

--- Comment #9 from Lipeng Zhu  ---
Since I still can't reproduce the failure on my side :(, just curious, will the
new added 'rwlock' test cases failed on mutex lock?

[Bug c++/113110] GCC rejects call to more specialized const char array version with string literal

2023-12-22 Thread harald at gigawatt dot nl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113110

Harald van Dijk  changed:

   What|Removed |Added

 CC||harald at gigawatt dot nl

--- Comment #6 from Harald van Dijk  ---
(In reply to Jason Liam from comment #3)
> Are you sure? I mean if you add another template parameter `U` to the second
> parameter and use it then gcc starts accepting the code and using the more
> specialized version. Demo:https://godbolt.org/z/W7Ma6c5Ts

In order to determine whether int compare(const char (&)[N], const char (&)[M])
is more specialised than int compare(const T &, const T &), template argument
deduction is attempted to solve one in terms of the other. Solving gives T =
char[N] for the first parameter, but T = char[M] for the second parameter. This
is a conflict that is ignored by MSVC.

When adding the second template parameter `U`, solving gives T = char[N], and U
= char[M]. This is never a conflict, this unambiguously makes the array version
more specialised.

(In reply to Andrew Pinski from comment #5)
> I am still suspecting MSVC of not implementing the C++ Defect report 214 .

At first glance, both GCC/clang and MSVC behaviour look like legitimate but
different interpretations of DR 214. Whether MSVC is right to ignore that
conflict is an open question covered by
https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2160.

[Bug testsuite/113005] 'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test timeouts

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113005

--- Comment #10 from Jakub Jelinek  ---
For the distro builds the log files are all I have (all logs in
https://koji.fedoraproject.org/koji/taskinfo?taskID=110644223 )
For what I can reproduce on my box (rwlock_1.exe built in the
x86_64-pc-linux-gnu/libgomp/testsuite subdirectory using the -O0
compilation line from libgomp*/*.sep:
$ OMP_NUM_THREADS=4096 LD_LIBRARY_PATH=../.libs/ time ./rwlock_1.exe
STOP 2
At line 28 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
2187, file = '***_tst.dat')
Fortran runtime error: End of file
STOP 2
Command exited with non-zero status 2
0.11user 1.19system 0:00.19elapsed 691%CPU (0avgtext+0avgdata
51824maxresident)k
0inputs+368outputs (0major+12303minor)pagefaults 0swaps
$ OMP_NUM_THREADS=4096 LD_LIBRARY_PATH=../.libs/ time ./rwlock_1.exe
STOP 2
STOP 2
Command exited with non-zero status 2
0.09user 1.03system 0:00.17elapsed 650%CPU (0avgtext+0avgdata
49988maxresident)k
0inputs+288outputs (0major+11835minor)pagefaults 0swaps
$ OMP_NUM_THREADS=4096 LD_LIBRARY_PATH=../.libs/ time ./rwlock_1.exe
STOP 2
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
3798)
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
3151)
Internal Error: Unit number changed
Internal Error: Unit number changed

Error termination. Backtrace:

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit = 795)
Internal Error: Unit number changed

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit = 46)
Internal Error: Unit number changed

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
2386)
Internal Error: Unit number changed

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
1841)
Fortran runtime error: Cannot open file '***_tst.dat': Too many open files

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
3692)
Fortran runtime error: Cannot open file '***_tst.dat': Too many open files

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
3880)
Internal Error: Unit number changed

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit = 947)
Internal Error: Unit number changed

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
2930)
Internal Error: Unit number changed

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit = 565)
Internal Error: Unit number changed

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit = 323)
Internal Error: Unit number changed

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
1651)
Fortran runtime error: Cannot open file '***_tst.dat': Too many open files

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
1684)
Fortran runtime error: Cannot open file '***_tst.dat': Too many open files
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
3659)

Error termination. Backtrace:
Fortran runtime error: Cannot open file '***_tst.dat': Too many open files

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit = 846)
Fortran runtime error: Cannot open file '846_tst.dat': Too many open files

Error termination. Backtrace:
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
2224)
Fortran runtime error: Cannot open file '***_tst.dat': Too many open files

Error termination. Backtrace:

Could not print backtrace: /proc/self/exe: Too many open files
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
3913)
Fortran runtime error: Cannot open file '***_tst.dat': Too many open files

Error termination. Backtrace:
#0  0x7f9728be6a3e
#1  0x7f9728be7509
#2  0x7f9728be805f
#0  0x7f9728be6a3e
#3  0x7f9728e29128
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit =
1183)
#1  0x7f9728be7509
#4  0x7f9728e2940d
Fortran runtime error: Cannot open file '***_tst.dat': Too many open files
#2  0x7f9728be805f
#5  0x40140f
At line 17 of file
/home/jakub/src/gcc/libgomp/testsuite/libgomp.fortran/rwlock_1.f90 (unit = 860)

Error termination. Backtrace:
#3  0x7f9728e29128

[Bug tree-optimization/113102] during GIMPLE pass: bitintlower ICE: SIGSEGV with _BitInt() at -O1 or -O2

2023-12-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113102

--- Comment #4 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:d3defa435e9d04d6ab6585ac184989941c7ad51e

commit r14-6803-gd3defa435e9d04d6ab6585ac184989941c7ad51e
Author: Jakub Jelinek 
Date:   Fri Dec 22 12:27:05 2023 +0100

lower-bitint: Fix handle_cast ICE [PR113102]

My recent change to use m_data[save_data_cnt] instead of
m_data[save_data_cnt + 1] when inside of a loop (m_bb is non-NULL)
broke the following testcase.  When we create a PHI node on the loop
using prepare_data_in_out, both m_data[save_data_cnt{, + 1}] are
computed and the fix was right, but there are also cases when we in
a loop (m_bb non-NULL) emit a nested cast with too few limbs and
then just use constant indexes for all accesses - in that case
only m_data[save_data_cnt + 1] is initialized and m_data[save_data_cnt]
is NULL.  In those cases, we want to use the former.

2023-12-22  Jakub Jelinek  

PR tree-optimization/113102
* gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Only
use m_data[save_data_cnt] if it is non-NULL.

* gcc.dg/bitint-58.c: New test.

[Bug tree-optimization/113102] during GIMPLE pass: bitintlower ICE: SIGSEGV with _BitInt() at -O1 or -O2

2023-12-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113102

--- Comment #5 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:f5198f0264e773d3b5d55f09a579313b0b231527

commit r14-6804-gf5198f0264e773d3b5d55f09a579313b0b231527
Author: Jakub Jelinek 
Date:   Fri Dec 22 12:28:06 2023 +0100

lower-bitint: Handle unreleased SSA_NAMEs from earlier passes gracefully
[PR113102]

On the following testcase earlier passes leave around an unreleased
SSA_NAME - non-GIMPLE_NOP SSA_NAME_DEF_STMT which isn't in any bb.
The following patch makes bitint lowering resistent against those,
the first hunk is where we'd for certain kinds of stmts try to ammend
them and the latter is where we'd otherwise try to remove them,
neither of which works.  The other loops over all SSA_NAMEs either
already also check gimple_bb (SSA_NAME_DEF_STMT (s)) or it doesn't
matter that much if we process it or not (worst case it means e.g.
the pass wouldn't return early even when it otherwise could).

2023-12-22  Jakub Jelinek  

PR tree-optimization/113102
* gimple-lower-bitint.cc (gimple_lower_bitint): Handle unreleased
large/huge _BitInt SSA_NAMEs.

* gcc.dg/bitint-59.c: New test.

[Bug tree-optimization/112941] during GIMPLE pass: bitintlower ICE: in handle_operand_addr, at gimple-lower-bitint.cc:2126 (gimple-lower-bitint.cc:2134) at -O with _BitInt()

2023-12-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112941

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:0a6aa1927597d821a85bc3d1fd7682256c25b548

commit r14-6805-g0a6aa1927597d821a85bc3d1fd7682256c25b548
Author: Jakub Jelinek 
Date:   Fri Dec 22 12:28:54 2023 +0100

symtab-thunks: Use aggregate_value_p even on is_gimple_reg_type returns
[PR112941]

Large/huge _BitInt types are returned in memory and the bitint lowering
pass right now relies on that.
The gimplification etc. use aggregate_value_p to see if it should be
returned in memory or not and use
   = _123;
  return ;
rather than
  return _123;
But expand_thunk used e.g. by IPA-ICF was performing an optimization,
assuming is_gimple_reg_type is always passed in registers and not calling
aggregate_value_p in that case.  The following patch changes it to match
what the gimplification etc. are doing.

2023-12-22  Jakub Jelinek  

PR tree-optimization/112941
* symtab-thunks.cc (expand_thunk): Check aggregate_value_p
regardless
of whether is_gimple_reg_type (restype) or not.

* gcc.dg/bitint-60.c: New test.

[Bug rtl-optimization/112758] [13/14 Regression] Inconsistent Bitwise AND Operation Result between int and long long int

2023-12-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112758

--- Comment #17 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:cefae511ed7fa34ef6d24b67a7bc305459bf10e8

commit r14-6806-gcefae511ed7fa34ef6d24b67a7bc305459bf10e8
Author: Jakub Jelinek 
Date:   Fri Dec 22 12:29:34 2023 +0100

combine: Don't optimize paradoxical SUBREG AND CONST_INT on
WORD_REGISTER_OPERATIONS targets [PR112758]

As discussed in the PR, the following testcase is miscompiled on RISC-V
64-bit, because num_sign_bit_copies in one spot pretends the bits in
a paradoxical SUBREG beyond SUBREG_REG SImode are all sign bit copies:
5444  /* For paradoxical SUBREGs on machines where all register
operations
5445 affect the entire register, just look inside.  Note
that we are
5446 passing MODE to the recursive call, so the number of
sign bit
5447 copies will remain relative to that mode, not the
inner mode.
5448
5449 This works only if loads sign extend.  Otherwise, if
we get a
5450 reload for the inner part, it may be loaded from the
stack, and
5451 then we lose all sign bit copies that existed before
the store
5452 to the stack.  */
5453  if (WORD_REGISTER_OPERATIONS
5454  && load_extend_op (inner_mode) == SIGN_EXTEND
5455  && paradoxical_subreg_p (x)
5456  && MEM_P (SUBREG_REG (x)))
and then optimizes based on that in one place, but then the
r7-1077 optimization triggers in and treats all the upper bits in
paradoxical SUBREG as undefined and performs based on that another
optimization.  The r7-1077 optimization is done only if SUBREG_REG
is either a REG or MEM, from the discussions in the PR seems that if
it is a REG, the upper bits in paradoxical SUBREG on
WORD_REGISTER_OPERATIONS targets aren't really undefined, but we can't
tell what values they have because we don't see the operation which
computed that REG, and for MEM it depends on load_extend_op - if
it is SIGN_EXTEND, the upper bits are sign bit copies and so something
not really usable for the optimization, if ZERO_EXTEND, they are zeros
and it is usable for the optimization, for UNKNOWN I think it is better
to punt as well.

So, the following patch basically disables the r7-1077 optimization
on WORD_REGISTER_OPERATIONS unless we know it is still ok for sure,
which is either if sub_width is >= BITS_PER_WORD because then the
WORD_REGISTER_OPERATIONS rules don't apply, or load_extend_op on a MEM
is ZERO_EXTEND.

2023-12-22  Jakub Jelinek  

PR rtl-optimization/112758
* combine.cc (make_compopund_operation_int): Optimize AND of a
SUBREG
based on nonzero_bits of SUBREG_REG and constant mask on
WORD_REGISTER_OPERATIONS targets only if it is a zero extending
MEM load.

* gcc.c-torture/execute/pr112758.c: New test.

[Bug c++/113113] New: False -Wmismatched-new-delete in case of destroying operator delete

2023-12-22 Thread fchelnokov at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113113

Bug ID: 113113
   Summary: False -Wmismatched-new-delete in case of destroying
operator delete
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fchelnokov at gmail dot com
  Target Milestone: ---

Consider the program as follows:


#include 

struct Shape {
void operator delete(Shape *, std::destroying_delete_t);
};

struct Triangle : Shape {};

void Shape::operator delete(Shape *p, std::destroying_delete_t) {
static_cast(p)->~Triangle();
::operator delete(p);
}

int main() {
Shape *p = new Triangle;
delete p;
}


GCC issues presumably false

warning: 'static void Shape::operator delete(Shape*, std::destroying_delete_t)'
called on pointer returned from a mismatched allocation function
[-Wmismatched-new-delete]

for this program as well as for longer original one from
https://stackoverflow.com/a/67595790/7325599

Online demo: https://godbolt.org/z/Pfc4PaGbd

[Bug target/113114] New: ICE in try_promote_writeback aarch64-ldp-fusion.cc

2023-12-22 Thread fkastl at suse dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113114

Bug ID: 113114
   Summary: ICE in try_promote_writeback aarch64-ldp-fusion.cc
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code, needs-bisection
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fkastl at suse dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: aarch64-linux-gnu

While compiling the GCC testcase gcc.c-torture/execute/pr59643.c using an
aarch64 crosscompiler with these options

aarch64-linux-gnu-gcc
/home/worker/buildworker/tiber-option-juggler/build/gcc/testsuite/gcc.c-torture/execute/pr59643.c
-mabi=ilp32 -O2

the compiler runs into an ICE

during RTL pass: ldp_fusion
/home/worker/buildworker/tiber-option-juggler/build/gcc/testsuite/gcc.c-torture/execute/pr59643.c:
In function ‘foo’:
/home/worker/buildworker/tiber-option-juggler/build/gcc/testsuite/gcc.c-torture/execute/pr59643.c:11:1:
internal compiler error: in try_promote_writeback, at
config/aarch64/aarch64-ldp-fusion.cc:2604
   11 | }
  | ^
0x789ba3 try_promote_writeback
   
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/config/aarch64/aarch64-ldp-fusion.cc:2604
0x789ba3 ldp_fusion_bb(rtl_ssa::bb_info*)
   
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/config/aarch64/aarch64-ldp-fusion.cc:2635
0x11bc4a7 ldp_fusion()
   
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/config/aarch64/aarch64-ldp-fusion.cc:2655
0x11bc528 execute
   
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/gcc/config/aarch64/aarch64-ldp-fusion.cc:2705
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.


Configuration of the compiler:

Using built-in specs.
COLLECT_GCC=/home/worker/cross/bin/aarch64-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/home/worker/cross/libexec/gcc/aarch64-linux-gnu/14.0.0/lto-wrapper
Target: aarch64-linux-gnu
Configured with:
/home/worker/buildworker/tiber-gcc-trunk-aarch64/build/configure
--enable-languages=c,c++,fortran,rust,m2 --disable-bootstrap
--disable-libsanitizer --disable-multilib --enable-checking=release
--prefix=/home/worker/cross --target=aarch64-linux-gnu
--with-as=/usr/bin/aarch64-suse-linux-as
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231221 (experimental)
ec2ec24a4d4d1175f72641a95010c2312eb38ccd (GCC)

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #2 from Sam James  ---
(In reply to Jakub Jelinek from comment #1)
> Any progress with the bisection?

sorry, not yet. I've been away from the computer mostly for an emergency.

I did make a start, but I got frustrated with how the Makefile deps seem broken
(replacing a .o and running 'make' doesn't trigger a rebuild correctly in all
cases).

> Or at least details what exactly are you compiling (with what patches etc.)?

I can reproduce it consistently with:
```
wget https://www.fftw.org/fftw-3.3.10.tar.gz
tar xvf fftw-3.3.10.tar.xz && cd fftw-3.3.10
./configure CFLAGS="-O3 -m32 -march=znver2 -ggdb3"
make -j$(nproc)
make -j$(nproc) check
```

Let me know if you need more.

[Bug target/113115] New: ICE In extract_constrain_insn_cached recog.cc with ppc64le-linux-gnu crosscompiler

2023-12-22 Thread fkastl at suse dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113115

Bug ID: 113115
   Summary: ICE In extract_constrain_insn_cached recog.cc with
ppc64le-linux-gnu crosscompiler
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code, needs-bisection
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fkastl at suse dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: ppc64le-linux-gnu

While compiling the GCC testcase gcc.target/powerpc/pr103627-3.c with the
ppc64le crosscompiler the with these options:

ppc64le-linux-gnu-gcc
/home/worker/buildworker/tiber-option-juggler/build/gcc/testsuite/gcc.target/powerpc/pr103627-3.c
-mno-power8-vector

the compiler runs into an ICE

/home/worker/buildworker/tiber-option-juggler/build/gcc/testsuite/gcc.target/powerpc/pr103627-3.c:
In function ‘main’:
/home/worker/buildworker/tiber-option-juggler/build/gcc/testsuite/gcc.target/powerpc/pr103627-3.c:19:1:
error: insn does not satisfy its constraints:
   19 | }
  | ^
(insn 55 54 56 (set (reg:OO 32 0)
(mem/c:OO (plus:DI (reg/f:DI 9 9 [128])
(const_int 32 [0x20])) [2 c+32 S32 A256]))
"/home/worker/buildworker/tiber-option-juggler/build/gcc/testsuite/gcc.target/powerpc/pr103627-3.c":12:1
2172 {*movoo}
 (nil))
during RTL pass: shorten
/home/worker/buildworker/tiber-option-juggler/build/gcc/testsuite/gcc.target/powerpc/pr103627-3.c:19:1:
internal compiler error: in extract_constrain_insn_cached, at recog.cc:2725
0x654383 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
   
/home/worker/buildworker/tiber-gcc-trunk-ppc64le/build/gcc/rtl-error.cc:108
0x6543a9 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
   
/home/worker/buildworker/tiber-gcc-trunk-ppc64le/build/gcc/rtl-error.cc:118
0x6538ee extract_constrain_insn_cached(rtx_insn*)
   
/home/worker/buildworker/tiber-gcc-trunk-ppc64le/build/gcc/recog.cc:2725
0x1352337 insn_default_length(rtx_insn*)
   
/home/worker/buildworker/tiber-gcc-trunk-ppc64le/build/gcc/config/rs6000/rs6000.md:15156
0x8eada2 shorten_branches(rtx_insn*)
   
/home/worker/buildworker/tiber-gcc-trunk-ppc64le/build/gcc/final.cc:1089
0x8eaddf rest_of_handle_shorten_branches
   
/home/worker/buildworker/tiber-gcc-trunk-ppc64le/build/gcc/final.cc:4338
0x8eaddf execute
   
/home/worker/buildworker/tiber-gcc-trunk-ppc64le/build/gcc/final.cc:4367
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.


Compiler configuration:

Using built-in specs.
COLLECT_GCC=/home/worker/cross/bin/ppc64le-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/home/worker/cross/libexec/gcc/ppc64le-linux-gnu/14.0.0/lto-wrapper
Target: ppc64le-linux-gnu
Configured with:
/home/worker/buildworker/tiber-gcc-trunk-ppc64le/build/configure
--enable-languages=c,c++,fortran,rust,m2 --disable-bootstrap
--disable-libsanitizer --disable-multilib --enable-checking=release
--prefix=/home/worker/cross --target=ppc64le-linux-gnu
--with-as=/usr/bin/powerpc64le-suse-linux-as
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231221 (experimental)
ec2ec24a4d4d1175f72641a95010c2312eb38ccd (GCC)

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-12-22 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Wilco  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

--- Comment #16 from Wilco  ---
Fixed by
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=3fa689f6ed8387d315e58169bb9bace3bd508c0a

libatomic: Enable lock-free 128-bit atomics on AArch64

Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible with
existing binaries (as for these GCC always calls into libatomic, so all 128-bit
atomic uses in a process are switched), gives better performance than locking
atomics and is what most users expect.

128-bit atomic loads use a load/store exclusive loop if LSE2 is not supported.
This results in an implicit store which is invisible to software as long as the
given address is writeable (which will be true when using atomics in real
code).

This doesn't yet change __atomic_is_lock_free eventhough all atomics are
finally
lock-free on AArch64.

libatomic:
* config/linux/aarch64/atomic_16.S: Implement lock-free ARMv8.0
atomics.
(libat_exchange_16): Merge RELEASE and ACQ_REL/SEQ_CST cases.
* config/linux/aarch64/host-config.h: Use atomic_16.S for baseline
v8.0.

[Bug target/113116] New: ~11-17% exec time regression of 436.cactusADM on aarch64

2023-12-22 Thread fkastl at suse dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113116

Bug ID: 113116
   Summary: ~11-17% exec time regression of 436.cactusADM on
aarch64
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization, needs-bisection
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fkastl at suse dot cz
  Target Milestone: ---
  Host: aarch64-linux-gnu
Target: aarch64-linux-gnu

As seen on the graphs here

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=578.100.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=586.100.0

between commits
g:8e0568d8ac9dbfc8
g:5641787abeea0fdc

there is a slowdown of 436.cactusADM SPEC2006 benchmark, 11% for Ofast native
LTO PGO and 17% for Ofast native LTO.

This is on aarch64. I haven't seen similar slowdowns on other architectures.

[Bug target/113116] ~11-17% exec time regression of 436.cactusADM on aarch64

2023-12-22 Thread fkastl at suse dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113116

Filip Kastl  changed:

   What|Removed |Added

 Blocks||26163

--- Comment #1 from Filip Kastl  ---
The cpu is ampere altra


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

Jakub Jelinek  changed:

   What|Removed |Added

 CC||fweimer at redhat dot com

--- Comment #3 from Jakub Jelinek  ---
Seems the package's configure is affected by most likely the modern C changes,
I see
--- config.h.good   2023-12-22 17:47:44.615207332 +0100
+++ config.h.bad2023-12-22 17:46:42.304068624 +0100
@@ -37,7 +37,7 @@
 /* #undef F77_FUNC_ */

 /* Define if F77_FUNC and F77_FUNC_ are equivalent. */
-/* #undef F77_FUNC_EQUIV */
+#define F77_FUNC_EQUIV 1

 /* Define if F77 and FC dummy `main' functions are identical. */
 /* #undef FC_DUMMY_MAIN_EQ_F77 */
@@ -404,7 +404,7 @@

 /* Include g77-compatible wrappers in addition to any other Fortran wrappers.
*/
-/* #undef WITH_G77_WRAPPERS */
+#define WITH_G77_WRAPPERS 1

 /* Use our own aligned malloc routine; mainly helpful for Windows systems
lacking aligned allocation system-library routines. */
diff in config.h between my system gcc 12 and gcc 14 snapshot.
But that isn't the reason for the failure.

[Bug target/113114] [14 Regression] ICE compiling gcc.c-torture/execute/pr59643.cwith -mabi=ilp32; in try_promote_writeback aarch64-ldp-fusion.cc

2023-12-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113114

Andrew Pinski  changed:

   What|Removed |Added

   Keywords|needs-bisection |
Summary|ICE in  |[14 Regression] ICE
   |try_promote_writeback   |compiling
   |aarch64-ldp-fusion.cc   |gcc.c-torture/execute/pr596
   ||43.cwith -mabi=ilp32; in
   ||try_promote_writeback
   ||aarch64-ldp-fusion.cc
   Target Milestone|--- |14.0

--- Comment #1 from Andrew Pinski  ---
Most likely r14-6605-gc0911c6b357ba9 when aarch64-ldp-fusion.cc was introduced.

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #4 from Jakub Jelinek  ---
Bisection points to hc2cf2_16.o.

[Bug c++/113117] New: ambiguous call during operator overloading is not detected for templates

2023-12-22 Thread armagvvg at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113117

Bug ID: 113117
   Summary: ambiguous call during operator overloading is not
detected for templates
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: armagvvg at gmail dot com
  Target Milestone: ---

In a nutshell: the next code snippet is compiled successfully, but shouldn't be
according to a language standard:

struct S {
template 
S& operator<<(T) { return *this; }
};

template 
T& operator<<(T& s, int) { return s; }

int main () {
S s;
s << 1;
}

Details:

Operator overloading for classes can be implemented with free functions and
with class member. When both exist, the C++ standard (I used C++17, part 16.5.2
- Overloaded Operators -> Binary operators) says that "If both forms of the
operator function have been declared, the rules in 16.3.1.2 determine which, if
any, interpretation is used.". The set of candidates contains both the member
and the free function then. The member is considered as a free function with
first implicit parameter (16.3.1 - Candidate functions and argument lists) - "a
member function is considered to have an extra parameter, called the implicit
object parameter, which represents the object for which the member function has
been called".

Thus we have two templates here:
1. template  T& operator<<(T&, int) - defined in the code
2. template  S& operator<<(S, T) - synthesized from the member
function

16.3.3 "Best viable function" mentions that "F1 and F2 are function template
specializations, and the function template for F1 is more specialized than the
template for F2 according to the partial ordering rules described in
17.5.6.2...". But no one template is more specialized for a call "operator<<(S,
int)". Thus we have two best viable functions and "If there is exactly one
viable function that is a better function than all other viable functions, then
it is the
one selected by overload resolution; otherwise the call is ill-formed".

This code snippet should be ill-formed. But it doesn't.

clang detects this as an error.

[Bug c++/113117] ambiguous call during operator overloading is not detected for templates

2023-12-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113117

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||accepts-invalid,
   ||needs-bisection

--- Comment #1 from Andrew Pinski  ---
This seems to be fixed on the trunk.

[Bug c++/113117] ambiguous call during operator overloading is not detected for templates

2023-12-22 Thread armagvvg at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113117

--- Comment #2 from Vyacheslav Grigoryev  ---
Looks so, checking on https://godbolt.org/z/vb6s6cY6Y. Hm... wasting a time on
filling the bug :(

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #5 from Jakub Jelinek  ---
$ /opt/notnfs/gcc-bisect/obj/gcc/cc1.r14-6209 -quiet -nostdinc -O3 -m32
-march=znver2 -ggdb3 hc2cf2_16.i -o hc2cf2_16.s1
$ /opt/notnfs/gcc-bisect/obj/gcc/cc1.r14-6210 -quiet -nostdinc -O3 -m32
-march=znver2 -ggdb3 hc2cf2_16.i -o hc2cf2_16.s2
$ gcc -c -o /tmp/hc2cf2_16.o -xassembler /tmp/hc2cf2_16.s1 -m32
$ gcc -O3 -m32 -march=znver2 -ggdb3 -o bench2 bench-bench.o bench-hook.o
bench-fftw-bench.o /tmp/hc2cf2_16.o ../.libs/libfftw3.a
../libbench2/libbench2.a -lm; ./bench2 --verbose=1   --verify 'obc3x13'
--verify 'ibc3x13' --verify '//ifr4752'
obc3x13 2.86313e-16 2.84445e-16 7.6393e-16
ibc3x13 2.55967e-16 2.84445e-16 7.42983e-16
//ifr4752 2.74708e-16 1.85534e-15 9.24976e-16
$ gcc -c -o /tmp/hc2cf2_16.o -xassembler /tmp/hc2cf2_16.s2 -m32
$ gcc -O3 -m32 -march=znver2 -ggdb3 -o bench2 bench-bench.o bench-hook.o
bench-fftw-bench.o /tmp/hc2cf2_16.o ../.libs/libfftw3.a
../libbench2/libbench2.a -lm; ./bench2 --verbose=1   --verify 'obc3x13'
--verify 'ibc3x13' --verify '//ifr4752'
obc3x13 2.86313e-16 2.84445e-16 7.6393e-16
ibc3x13 2.55967e-16 2.84445e-16 7.42983e-16
corrupted size vs. prev_size
Aborted (core dumped)

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #6 from Jakub Jelinek  ---
Created attachment 56923
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56923&action=edit
hc2cf2_16.i.xz

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #7 from Jakub Jelinek  ---
With -g0 in addition the assembly difference is
--- hc2cf2_16.s12023-12-22 13:14:14.0 -0500
+++ hc2cf2_16.s22023-12-22 13:14:06.0 -0500
@@ -16,7 +16,6 @@ hc2cf2_16:
 .LCFI4:
movl552(%esp), %eax
movl528(%esp), %edx
-   movl540(%esp), %esi
movl556(%esp), %ebp
decl%eax
sall$6, %eax

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread fw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

Florian Weimer  changed:

   What|Removed |Added

 CC||fw at gcc dot gnu.org

--- Comment #8 from Florian Weimer  ---
(In reply to Jakub Jelinek from comment #3)
> Seems the package's configure is affected by most likely the modern C
> changes,
> I see
> --- config.h.good 2023-12-22 17:47:44.615207332 +0100
> +++ config.h.bad  2023-12-22 17:46:42.304068624 +0100
> @@ -37,7 +37,7 @@
>  /* #undef F77_FUNC_ */
>  
>  /* Define if F77_FUNC and F77_FUNC_ are equivalent. */
> -/* #undef F77_FUNC_EQUIV */
> +#define F77_FUNC_EQUIV 1
>  
>  /* Define if F77 and FC dummy `main' functions are identical. */
>  /* #undef FC_DUMMY_MAIN_EQ_F77 */
> @@ -404,7 +404,7 @@
>  
>  /* Include g77-compatible wrappers in addition to any other Fortran
> wrappers.
> */
> -/* #undef WITH_G77_WRAPPERS */
> +#define WITH_G77_WRAPPERS 1
>  
>  /* Use our own aligned malloc routine; mainly helpful for Windows systems
> lacking aligned allocation system-library routines. */
> diff in config.h between my system gcc 12 and gcc 14 snapshot.
> But that isn't the reason for the failure.

I don't see a different to system gcc 13. It has F77_FUNC_EQUIV and
WITH_G77_WRAPPERS set as well. Do you see this for all build variants of fftw?

[Bug middle-end/100861] False positive -Wmismatched-new-delete with destroying operator delete

2023-12-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100861

Andrew Pinski  changed:

   What|Removed |Added

 CC||fchelnokov at gmail dot com

--- Comment #4 from Andrew Pinski  ---
*** Bug 113113 has been marked as a duplicate of this bug. ***

[Bug c++/113113] False -Wmismatched-new-delete in case of destroying operator delete

2023-12-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113113

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
Dup of bug 100861.

*** This bug has been marked as a duplicate of bug 100861 ***

[Bug fortran/113118] New: ICE on assignment of derived types with allocatable class component

2023-12-22 Thread baradi09 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113118

Bug ID: 113118
   Summary: ICE on assignment of derived types with allocatable
class component
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: baradi09 at gmail dot com
  Target Milestone: ---

I get an internal compiler error with the following demo code. As far, as I can
judge, the code is standard conforming.

module bugdemo
  implicit none

  type :: base_type
character(:), allocatable :: name
  end type base_type

  type :: base_type_item
class(base_type), allocatable :: item
  end type base_type_item

  type, extends(base_type) :: derived_type
integer :: val = 0
  end type derived_type

contains

  function derived_type_as_item(name, val) result(item)
character(*), intent(in) :: name
integer, intent(in) :: val
type(base_type_item), allocatable :: item

item = base_type_item(derived_type(name=name, val=val))

  end function derived_type_as_item

end module bugdemo

Compiling it with

gfortran -c bugdemo.f90

results in

   23 | item = base_type_item(derived_type(name=name, val=val)) 
  |   1 
internal compiler error: in fold_convert_loc, at fold-const.cc:2627 
0x69f6fa fold_convert_loc(unsigned int, tree_node*, tree_node*) 
../.././gcc/fold-const.cc:2627  
0x824e17 gfc_trans_subcomponent_assign 
../.././gcc/fortran/trans-expr.cc:9027 
0x825a22 gfc_trans_structure_assign(tree_node*, gfc_expr*, bool, bool) 
../.././gcc/fortran/trans-expr.cc:9265
0x826808 gfc_conv_structure(gfc_se*, gfc_expr*, int)   
../.././gcc/fortran/trans-expr.cc:9332 
0x81d6fc gfc_conv_expr(gfc_se*, gfc_expr*) 
../.././gcc/fortran/trans-expr.cc:9500 
0x829ab5 gfc_trans_assignment_1
../.././gcc/fortran/trans-expr.cc:11877
0x7e0f77 trans_code
../.././gcc/fortran/trans.cc:2229  
0x80f1e9 gfc_generate_function_code(gfc_namespace*)
../.././gcc/fortran/trans-decl.cc:7715 
0x7e5641 gfc_generate_module_code(gfc_namespace*)  
../.././gcc/fortran/trans.cc:2649  
0x785d35 translate_all_program_units   
../.././gcc/fortran/parse.cc:6707  
0x785d35 gfc_parse_file()  
../.././gcc/fortran/parse.cc:7026  
0x7dde4f gfc_be_parse_file 
../.././gcc/fortran/f95-lang.cc:229
Please submit a full bug report, with preprocessed source. 
Please include the complete backtrace with any bug report. 
See  for instructions.

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #9 from Jakub Jelinek  ---
Seems in *.reload we have:
(insn 5 4 6 2 (set (reg/v/f:SI 4 si [orig:504 Im ] [504])
(mem/f/c:SI (plus:SI (reg/f:SI 7 sp)
(const_int 540 [0x21c])) [3 Im+0 S4 A32])) "hc2cf2_16.c":456:1
85 {*movsi_internal}
 (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 16 argp)
(const_int 12 [0xc])) [3 Im+0 S4 A32])
(nil)))
...
(insn 487 486 488 3 (set (mem/f/c:SI (plus:SI (reg/f:SI 7 sp)
(const_int 540 [0x21c])) [3 Im+0 S4 A32])
(reg/v/f:SI 4 si [orig:504 Im ] [504])) "hc2cf2_16.c":691:14 85
{*movsi_internal}
 (nil))
(insn 488 487 489 3 (set (reg/v/f:SI 4 si [orig:500 W ] [500])
(reg/v/f:SI 0 ax [orig:500 W ] [500])) "hc2cf2_16.c":691:14 85
{*movsi_internal}
 (nil))
...
(insn 491 490 474 3 (set (reg/v/f:SI 2 cx [orig:504 Im ] [504])
(mem/f/c:SI (plus:SI (reg/f:SI 7 sp)
(const_int 540 [0x21c])) [3 Im+0 S4 A32])) "hc2cf2_16.c":691:14
85 {*movsi_internal}
 (nil))
Now, postreload removes the useless store with uid 487 and due to the swapping
of vzeroupper pass with postreload, the good vs. bad difference before gcse is
 (insn 5 4 6 2 (set (reg/v/f:SI 4 si [orig:504 Im ] [504])
 (mem/f/c:SI (plus:SI (reg/f:SI 7 sp)
 (const_int 540 [0x21c])) [3 Im+0 S4 A32])) "hc2cf2_16.c":456:1
85 {*movsi_internal}
- (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 16 argp)
-(const_int 12 [0xc])) [3 Im+0 S4 A32])
-(nil)))
+ (expr_list:REG_UNUSED (reg/v/f:SI 4 si [orig:504 Im ] [504])
+(expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 16 argp)
+(const_int 12 [0xc])) [3 Im+0 S4 A32])
+(nil
i.e. an extra REG_UNUSED note (on some others as well).
At that point the note is correct.
Then comes gcse and sets cx to si at the start of the bb where insn 491 used to
appear, extending the lifetime of the register, but the REG_UNUSED note is not
removed.
And then comes the pro_and_epilogue pass and because of the REG_UNUSED note
deletes the load in insn 5, even when it is now actually used.
So we are back to the PR112572 discussions.
Whether make REG_UNUSED notes only trusted in passes which explicitly request
computation of the notes problem as Richard proposed, or whether just have a
couple of passes known to screw up REG_UNUSED notes to mark them as
non-trustworthy (mainly cse/gcse/postreload-cse), or just when those passes
actually extend lifetime of something, or attempt to remove REG_UNUSED notes
actively when they extend stuff.

[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059

--- Comment #10 from Jakub Jelinek  ---
(In reply to Florian Weimer from comment #8)
> (In reply to Jakub Jelinek from comment #3)
> > Seems the package's configure is affected by most likely the modern C
> > changes,
> > I see
> > --- config.h.good   2023-12-22 17:47:44.615207332 +0100
> > +++ config.h.bad2023-12-22 17:46:42.304068624 +0100
> > @@ -37,7 +37,7 @@
> >  /* #undef F77_FUNC_ */
> >  
> >  /* Define if F77_FUNC and F77_FUNC_ are equivalent. */
> > -/* #undef F77_FUNC_EQUIV */
> > +#define F77_FUNC_EQUIV 1
> >  
> >  /* Define if F77 and FC dummy `main' functions are identical. */
> >  /* #undef FC_DUMMY_MAIN_EQ_F77 */
> > @@ -404,7 +404,7 @@
> >  
> >  /* Include g77-compatible wrappers in addition to any other Fortran
> > wrappers.
> > */
> > -/* #undef WITH_G77_WRAPPERS */
> > +#define WITH_G77_WRAPPERS 1
> >  
> >  /* Use our own aligned malloc routine; mainly helpful for Windows systems
> > lacking aligned allocation system-library routines. */
> > diff in config.h between my system gcc 12 and gcc 14 snapshot.
> > But that isn't the reason for the failure.
> 
> I don't see a different to system gcc 13. It has F77_FUNC_EQUIV and
> WITH_G77_WRAPPERS set as well. Do you see this for all build variants of
> fftw?

Ah, seems I don't have system gfortran installed, which is probably the cause
of the config.h difference.  Sorry for the false alarm.

[Bug tree-optimization/113119] New: ICE: verify_ssa failed: definition in block 18 does not dominate use in block 4 at -O1 with _BitInt

2023-12-22 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113119

Bug ID: 113119
   Summary: ICE: verify_ssa failed: definition in block 18 does
not dominate use in block 4 at -O1 with _BitInt
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
CC: jakub at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 56924
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56924&action=edit
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -O testcase.c 
testcase.c: In function 'foo':
testcase.c:6:1: error: definition in block 18 does not dominate use in block 4
6 | foo(_BitInt(4058) d)
  | ^~~
for SSA_NAME: _55 in statement:
_3 = _55;
during GIMPLE pass: bitintlower
testcase.c:6:1: internal compiler error: verify_ssa failed
0x177189f verify_ssa(bool, bool)
/repo/gcc-trunk/gcc/tree-ssa.cc:1203
0x13c53e5 execute_function_todo
/repo/gcc-trunk/gcc/passes.cc:2095
0x13c584e execute_todo
/repo/gcc-trunk/gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r14-6806-20231222122934-gcefae511ed7-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r14-6806-20231222122934-gcefae511ed7-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231222 (experimental) (GCC)

[Bug c++/113083] [14 Regression][arm] ICE in fold_convert_loc, at fold-const.cc:2602 since r14-5979-g99d114c15523e0

2023-12-22 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113083

--- Comment #4 from Marek Polacek  ---
The problem occurs only when we declone cdtors and are on a
targetm.cxx.cdtor_returns_this target like ARM.

Decloning causes us to create a thunk calling the "main" ctor:

A*
A::A (A *const this)
{
  return A::A (this);
}

'this' is now accepted in constexpr so we evaluate the call into {} and end up
with

{
  return *this = {};
}

which is gimplified into

{
  A *D.4937;
  D.4937 = *this = {};
}

but that means there's a discrepancy: we're converting A to A* and that
crashes.

I wonder if we should refuse to evaluate A::A (this) (returning a pointer) into
{} (not a pointer).

[Bug fortran/113118] ICE on assignment of derived types with allocatable class component

2023-12-22 Thread baradi09 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113118

--- Comment #1 from Bálint Aradi  ---
Just a further note, if I leave away dummy argument names, I do not get an ICE
any more, but the program still does not compile:


   24 | item = base_type_item(derived_type(name, val))
  | 1
Error: Too many components in structure constructor at (1)

Apparently, the fields of the base type are not considered, when the structure
constructor of the derived type is called.

[Bug fortran/113118] ICE on assignment of derived types with allocatable class component

2023-12-22 Thread baradi09 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113118

--- Comment #2 from Bálint Aradi  ---
Last note: replacing the problematic line with

allocate(item)
item%item = derived_type(name=name, val=val)

seems to compile (but I did not check, whether the compiled code behaves
correctly).

[Bug fortran/113118] ICE on assignment of derived types with allocatable class component

2023-12-22 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113118

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work||14.0
  Known to fail||13.2.1
   Keywords||ice-on-valid-code

--- Comment #3 from anlauf at gcc dot gnu.org ---
Works here on 14-trunk, but fails on 13-branch.
Might have been fixed recently.

Are you able to test on 14-trunk?

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

Vineet Gupta  changed:

   What|Removed |Added

 CC||vineetg at gcc dot gnu.org

--- Comment #13 from Vineet Gupta  ---
(In reply to JuzheZhong from comment #12)
> (In reply to Patrick O'Neill from comment #11)
> > (In reply to Patrick O'Neill from comment #10)
> > > I've kicked off 2 spec runs (zvl 128 and 256) using r14-6765-g4d9e0f3f211.
> > > I'll let you know the results when they finish.
> > 
> > My terminal crashed - so these are partial results:
> > zvl256: 3 runtime failures
> > 531.deepsjeng
> > ???
> > ???
> > 
> > zvl128: 1 runtime failure
> > 527.cam4_r
> > 
> > If I had to guess I would say the 2 ??? fails are the existing 521/549.
> 
> You mean those 2 cases are still failing?
> Do you have any ideas to locate those FAIL and extract them as a simple case?

> zvl128 / no vl: 1 runtime failure
> 527.cam4_r

Yes this still remains. It is hard to debug (for me at least) as this is
fortran.

However this goes away if simple_vsetvl is used (with -Ofast for rest of
buiild) - using [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641342.html

[Bug tree-optimization/86072] Poor codegen with atomics

2023-12-22 Thread phosit at autistici dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86072

Phosit  changed:

   What|Removed |Added

 CC||phosit at autistici dot org

--- Comment #4 from Phosit  ---
(In reply to Richard Biener from comment #2)
> Somebody has to decide if it's worth optimizing them and has to sit down and
> exactly specify what kind of optimizations are valid.
There is a paper about the optimization of atomics. It might not be detailed
enough.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html
Note that the memory-model changed a bit since the release of that paper.

> I guess it's worth optimizing them if these cases appear in real-world code
> (and then we'd like to see examples).
std::shared_ptr use fetch_add and fetch_sub. When a std::shared_ptr is not used
for syncronization this optimization could take effect.
PR 48987 is specifically about combining multiple fetch_add and fetch_sub.

[Bug tree-optimization/113119] ICE: verify_ssa failed: definition in block 18 does not dominate use in block 4 at -O1 with _BitInt

2023-12-22 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113119

--- Comment #1 from Jakub Jelinek  ---
Without the second redundant __builtin_add_overflow, we get
  a.0_1 = a;
  _7 = .ADD_OVERFLOW (a.0_1, 0);
  _2 = REALPART_EXPR <_7>;
  _3 = IMAGPART_EXPR <_7>;
  _4 = (_Bool) _3;
  c = _4;
  _5 = (_BitInt(8)) _2;
  b = _5;
before bitint lowering and lower it well, but the redundant call results in
  a.0_1 = a;
  _7 = .ADD_OVERFLOW (a.0_1, 0);
  _2 = IMAGPART_EXPR <_7>;
  _3 = (_Bool) _2;
  c = _3;
  _4 = REALPART_EXPR <_7>;
  _5 = (_BitInt(8)) _4;
  b = _5;
and optimizable_arith_overflow doesn't flag that as non-optimizable, so
the bitint lowering of .ADD_OVERFLOW happens in that case on the REALPART_EXPR
stmt.
optimizable_arith_overflow checks if both REALPART_EXPR and IMAGPART_EXPR
appear in the same bb (and that there are no other uses), but doesn't check
that REALPART_EXPR is first.  Either it should check that or perhaps allow it
first but require that it is not used in statements before the REALPART_EXPR.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #14 from Vineet Gupta  ---
(In reply to Vineet Gupta from comment #13)
> (In reply to JuzheZhong from comment #12)
> > (In reply to Patrick O'Neill from comment #11)
> > > (In reply to Patrick O'Neill from comment #10)
> > > > I've kicked off 2 spec runs (zvl 128 and 256) using 
> > > > r14-6765-g4d9e0f3f211.
> > > > I'll let you know the results when they finish.
> > > 
> > > My terminal crashed - so these are partial results:
> > > zvl256: 3 runtime failures
> > > 531.deepsjeng
> > > ???
> > > ???

At least 549.fotonik3d runtime failure with vl256 remains even with
simple_vsetvl.

[Bug c++/53499] Incorrect partial ordering result with member vs non-member

2023-12-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53499

Patrick Palka  changed:

   What|Removed |Added

 CC||armagvvg at gmail dot com

--- Comment #6 from Patrick Palka  ---
*** Bug 113117 has been marked as a duplicate of this bug. ***

[Bug c++/113117] ambiguous call during operator overloading is not detected for templates

2023-12-22 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113117

Patrick Palka  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 CC||ppalka at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Patrick Palka  ---
Recently fixed on trunk by r14-6221-gc1e54c82a9e185, and indeed this seems
pretty much a dup of PR53499.  Thanks for the detailed bug report nonetheless

*** This bug has been marked as a duplicate of bug 53499 ***

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #15 from JuzheZhong  ---
Currently, we don't have much run FAIL and ICE left in full coverage testing.

I suspect it is very corner case in SPEC.

You don't have to debug it. Just need to give me a preprocessed source file.

Like this:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110560

You can see google highway folks attachment is very big but I still can fix the
issue as long as you can give me some sources that I can reproduce the issues.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #16 from Vineet Gupta  ---
(In reply to JuzheZhong from comment #15)
> Currently, we don't have much run FAIL and ICE left in full coverage testing.
> 
> I suspect it is very corner case in SPEC.
> 
> You don't have to debug it. Just need to give me a preprocessed source file.
> 
> Like this:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110560
> 
> You can see google highway folks attachment is very big but I still can fix
> the issue as long as you can give me some sources that I can reproduce the
> issues.

As I mentioned already these are runtime failure mismatches, so we don't know
where the issue is and thus no reduced test case. 

FWIW I could/would have debugged gcc code it if I had a reduced test.

So we need to dig down into guts of the benchmark and see where the output is
generated, checkpoint and so on so forth etc.

The other approach is to try "defeature" autovec and see if can point to broad
areas (in backend/middle-end) where the issue could be.
e.g.
  - simple vs. lazy vsetvl
  - disabling reductions etc.

BTW I'm surprised you are not seeing these as there is nothing rivos specific
here. Are you running the full SPEC suite, including Fortran / Float workloads.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #17 from JuzheZhong  ---
PLCT told me they passed with zvl256b.

I always run SPEC with FIXED-VLMAX since we always care about peak performance
on our board.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #18 from Vineet Gupta  ---
(In reply to JuzheZhong from comment #17)
> PLCT told me they passed with zvl256b.
> 
> I always run SPEC with FIXED-VLMAX since we always care about peak
> performance
> on our board.

Sure we all have our preferred peak performance configs. But the compiler needs
to work for all vendors' configs. So as a test, can you try a scalable build
run at your end to at least see if you can see those issues ?

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #19 from JuzheZhong  ---
(In reply to Vineet Gupta from comment #18)
> (In reply to JuzheZhong from comment #17)
> > PLCT told me they passed with zvl256b.
> > 
> > I always run SPEC with FIXED-VLMAX since we always care about peak
> > performance
> > on our board.
> 
> Sure we all have our preferred peak performance configs. But the compiler
> needs to work for all vendors' configs. So as a test, can you try a scalable
> build run at your end to at least see if you can see those issues ?

I am not able to build and test SPEC since I don't have QEMU and SPEC
environment.

I should ask my colleague to do that but they are quite busy with company's
things and frankly I can't pull more resource on open source work from my
company.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #20 from JuzheZhong  ---
I am not able to build and test SPEC since I don't have QEMU and SPEC
environment.

I should ask my colleague to do that but they are quite busy with company's
things and frankly I can't pull more resource on open source work from my
company.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #21 from JuzheZhong  ---
Btw, I saw there are 2 more FAILs:

FAIL: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/tsvc/vect-tsvc-s1115.c execution test
FAIL: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/tsvc/vect-tsvc-s114.c execution test

on -march=rv64gcv_zvl1024b --param=riscv-autovec-lmul=dynamic.

I am not sure whether they are same issue as SPEC issue on your side.

I am going to fix them to see whether we are "lucky".

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #22 from palmer at gcc dot gnu.org ---
(In reply to JuzheZhong from comment #19)
> (In reply to Vineet Gupta from comment #18)
> > (In reply to JuzheZhong from comment #17)
> > > PLCT told me they passed with zvl256b.
> > > 
> > > I always run SPEC with FIXED-VLMAX since we always care about peak
> > > performance
> > > on our board.
> > 
> > Sure we all have our preferred peak performance configs. But the compiler
> > needs to work for all vendors' configs. So as a test, can you try a scalable
> > build run at your end to at least see if you can see those issues ?
> 
> I am not able to build and test SPEC since I don't have QEMU and SPEC
> environment.

Sorry, I'm kind of confused here: you're saying you can't build/test SPEC, but
then above saying you run SPEC.

> I should ask my colleague to do that but they are quite busy with company's
> things and frankly I can't pull more resource on open source work from my
> company.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #23 from JuzheZhong  ---
(In reply to palmer from comment #22)
> (In reply to JuzheZhong from comment #19)
> > (In reply to Vineet Gupta from comment #18)
> > > (In reply to JuzheZhong from comment #17)
> > > > PLCT told me they passed with zvl256b.
> > > > 
> > > > I always run SPEC with FIXED-VLMAX since we always care about peak
> > > > performance
> > > > on our board.
> > > 
> > > Sure we all have our preferred peak performance configs. But the compiler
> > > needs to work for all vendors' configs. So as a test, can you try a 
> > > scalable
> > > build run at your end to at least see if you can see those issues ?
> > 
> > I am not able to build and test SPEC since I don't have QEMU and SPEC
> > environment.
> 
> Sorry, I'm kind of confused here: you're saying you can't build/test SPEC,
> but then above saying you run SPEC.
> 
> > I should ask my colleague to do that but they are quite busy with company's
> > things and frankly I can't pull more resource on open source work from my
> > company.

I am sorry that my typo make you confused. I must say "we" instead of "I" :).

"We" is PLCT lab, my colleague, and Li Pan.

I just notice my careless writing, sometimes say "I", sometimes say "we".

Since I always ask some body do things I want to do.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #24 from JuzheZhong  ---
CC jiawei who run SPEC for me. Maybe you can help him to reproduce such issue
then I can debug it from his feedback.

[Bug target/113112] RISC-V: Dynamic LMUL feature stabilization for GCC-14 release

2023-12-22 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112

--- Comment #1 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:290230034092898981488d0716ddae43bd36c09f

commit r14-6810-g290230034092898981488d0716ddae43bd36c09f
Author: Juzhe-Zhong 
Date:   Sat Dec 23 07:07:42 2023 +0800

RISC-V: Make PHI initial value occupy live V_REG in dynamic LMUL cost model
analysis

Consider this following case:

foo:
ble a0,zero,.L11
lui a2,%hi(.LANCHOR0)
addisp,sp,-128
addia2,a2,%lo(.LANCHOR0)
mv  a1,a0
vsetvli a6,zero,e32,m8,ta,ma
vid.v   v8
vs8r.v  v8,0(sp) ---> spill
.L3:
vl8re32.v   v16,0(sp)---> reload
vsetvli a4,a1,e8,m2,ta,ma
li  a3,0
vsetvli a5,zero,e32,m8,ta,ma
vmv8r.v v0,v16
vmv.v.x v8,a4
vmv.v.i v24,0
vadd.vv v8,v16,v8
vmv8r.v v16,v24
vs8r.v  v8,0(sp)---> spill
.L4:
addiw   a3,a3,1
vadd.vv v8,v0,v16
vadd.vi v16,v16,1
vadd.vv v24,v24,v8
bne a0,a3,.L4
vsetvli zero,a4,e32,m8,ta,ma
sub a1,a1,a4
vse32.v v24,0(a2)
sllia4,a4,2
add a2,a2,a4
bne a1,zero,.L3
li  a0,0
addisp,sp,128
jr  ra
.L11:
li  a0,0
ret

Pick unexpected LMUL = 8.

The root cause is we didn't involve PHI initial value in the dynamic LMUL
calculation:

  # j_17 = PHI---> #
vect_vec_iv_.8_24 = PHI <_25(9), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }(5)>

We didn't count { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } in consuming vector register but it
does allocate an vector register group for it.

This patch fixes this missing count. Then after this patch we pick up
perfect LMUL (LMUL = M4)

foo:
ble a0,zero,.L9
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
mv  a2,a0
vsetivlizero,16,e32,m4,ta,ma
vid.v   v20
.L3:
vsetvli a3,a2,e8,m1,ta,ma
li  a5,0
vsetivlizero,16,e32,m4,ta,ma
vmv4r.v v16,v20
vmv.v.i v12,0
vmv.v.x v4,a3
vmv4r.v v8,v12
vadd.vv v20,v20,v4
.L4:
addiw   a5,a5,1
vmv4r.v v4,v8
vadd.vi v8,v8,1
vadd.vv v4,v16,v4
vadd.vv v12,v12,v4
bne a0,a5,.L4
sllia5,a3,2
vsetvli zero,a3,e32,m4,ta,ma
sub a2,a2,a3
vse32.v v12,0(a4)
add a4,a4,a5
bne a2,zero,.L3
.L9:
li  a0,0
ret

Tested on --with-arch=gcv no regression.

PR target/113112

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs):
Refine dump information.
(preferred_new_lmul_p): Make PHI initial value into live regs
calculation.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: New test.

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread jiawei at iscas dot ac.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

--- Comment #25 from jiawei  ---
I had run SPEC2017-v1.1.9 with rv64gcv_zvl256b, it passed the compile and run
on base and validate cases, used qemu 8.1.0.

[Bug middle-end/113109] [14 Regression] g++ EH tests fail at execution time for cris-elf after r14-6674-g4759383245ac97

2023-12-22 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113109

--- Comment #3 from Hans-Peter Nilsson  ---
It's __builtin_eh_return( that's miscompiled, such that the "handler" isn't
installed and the calling function will return to its caller instead of the
handler.

For the example below:

void f(__UINTPTR_TYPE__ p1, void *p2)
{
  __builtin_eh_return(p1, p2);
}

...there's a tell-tale diff between 6673 and 6674 in generated assembly code at
-O2:

@@ -23,7 +23,6 @@ _f:
move.d $r13,[$sp]
 .LCFI5:
move.d $r10,$r9
-   move.d $r11,[$sp+16]
move.d [$sp+],$r13
move.d [$sp+],$r12
move.d [$sp+],$r11


cris.h defines EH_RETURN_HANDLER_RTX (as a call to cris_return_addr_rtx
yielding) gen_rtx_MEM (Pmode, plus_constant (Pmode, virtual_incoming_args_rtx,
-4)).

I'm "guessing" that the problem with the patch, is that anything any port
stores through a pointer based on virtual_incoming_args_rtx before returning,
is now eliminated.

[Bug c++/112883] FAIL: g++.dg/modules/xtreme-header-2_c.C -std=c++2b (test for excess errors)

2023-12-22 Thread hp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112883

Hans-Peter Nilsson  changed:

   What|Removed |Added

 CC||hp at gcc dot gnu.org

--- Comment #1 from Hans-Peter Nilsson  ---
Is this different to PR112737 ?

[Bug c++/105467] Dependency file produced by C++ modules causes Ninja errors

2023-12-22 Thread saifi.khan at nishan dot io via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105467

Saifi Khan  changed:

   What|Removed |Added

 CC||saifi.khan at nishan dot io

--- Comment #3 from Saifi Khan  ---
(In reply to jpakkane from comment #2)
> It would be preferable to have the default work out of the box than having
> every end user having to add compiler flags to make things work.
> 
> Ninja is the most popular underlying build system for modules, having it
> work by default would make things easier for many people.

as an end user, i'd prefer to explicitly add compiler flag instead of getting
pinja, hazel, teson or some such by default !

[Bug testsuite/113085] New test case libgomp.c/alloc-pinned-1.c from r14-6499-g348874f0baac0f fails

2023-12-22 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113085

--- Comment #2 from seurer at gcc dot gnu.org ---
Looks like it is 65,536

seurer@ltcden2-lp1:~/gcc/git/build/gcc-test$ getconf PAGESIZE 
65536

[Bug testsuite/113085] New test case libgomp.c/alloc-pinned-1.c from r14-6499-g348874f0baac0f fails

2023-12-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113085

Andrew Pinski  changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu.org

--- Comment #3 from Andrew Pinski  ---
yes while the most common page size on x86_64 is 4k; 64k and 16k page sizes
also happen on other targets (16k shows up on aarch64, especially when on a Mac
due to limitations in the HW).
I personally use 64k on aarch64 .

[Bug middle-end/113109] [14 Regression] g++ EH tests fail at execution time for cris-elf after r14-6674-g4759383245ac97

2023-12-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113109

--- Comment #4 from Andrew Pinski  ---
Hmm, see PR 32398 and PR 32769. PR 32769 is interesting because it was caused
by the merge of the df branch where the store was being removed just like here
on cris. 

Oh and reading
https://inbox.sourceware.org/gcc-patches/200707151749.l6fhnxrt010...@hiauly1.hia.nrc.ca/
even mentions this exact issue it seems where dse.cc is removing the store and
such.

[Bug tree-optimization/113119] ICE: verify_ssa failed: definition in block 18 does not dominate use in block 4 at -O1 with _BitInt

2023-12-22 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113119

--- Comment #2 from Zdenek Sojka  ---
Created attachment 56925
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56925&action=edit
slightly less reduced testcase, ICEing elsewhere

The original testcase was ICEing at a different place, as does this testcase:

$ x86_64-pc-linux-gnu-gcc -O testcase.c 
during GIMPLE pass: bitintlower
testcase.c: In function 'foo':
testcase.c:8:1: internal compiler error: in operator[], at vec.h:910
8 | foo(_BitInt(4058) d)
  | ^~~
0x8a0a88 vec::operator[](unsigned int)
/repo/gcc-trunk/gcc/vec.h:910
0x8a0c6e vec::operator[](unsigned int)
/repo/gcc-trunk/gcc/value-relation.cc:736
0x8a0c6e vec::operator[](unsigned int)
/repo/gcc-trunk/gcc/vec.h:1599
0x8a0c6e equiv_oracle::add_equiv_to_block(basic_block_def*, bitmap_head*)
/repo/gcc-trunk/gcc/value-relation.cc:721
0x185e5c2 equiv_oracle::register_relation(basic_block_def*, relation_kind_t,
tree_node*, tree_node*)
/repo/gcc-trunk/gcc/value-relation.cc:675
0x26597e9 fold_using_range::range_of_range_op(vrange&,
gimple_range_op_handler&, fur_source&)
/repo/gcc-trunk/gcc/gimple-range-fold.cc:687
0x2659bf2 fold_using_range::fold_stmt(vrange&, gimple*, fur_source&,
tree_node*)
/repo/gcc-trunk/gcc/gimple-range-fold.cc:602
0x2641c7e gimple_ranger::fold_range_internal(vrange&, gimple*, tree_node*)
/repo/gcc-trunk/gcc/gimple-range.cc:265
0x2641c7e gimple_ranger::range_of_stmt(vrange&, gimple*, tree_node*)
/repo/gcc-trunk/gcc/gimple-range.cc:326
0x264597a gimple_ranger::range_of_expr(vrange&, tree_node*, gimple*)
/repo/gcc-trunk/gcc/gimple-range.cc:134
0x261e312 range_to_prec
/repo/gcc-trunk/gcc/gimple-lower-bitint.cc:1980
0x2620acb handle_operand_addr
/repo/gcc-trunk/gcc/gimple-lower-bitint.cc:2025
0x26217e7 lower_muldiv_stmt
/repo/gcc-trunk/gcc/gimple-lower-bitint.cc:3365
0x2635599 gimple_lower_bitint
/repo/gcc-trunk/gcc/gimple-lower-bitint.cc:6490
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug tree-optimization/113120] New: during GIMPLE pass: bitintlower ICE: SIGSEGV with _BitInt() at -O2

2023-12-22 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113120

Bug ID: 113120
   Summary: during GIMPLE pass: bitintlower ICE: SIGSEGV with
_BitInt() at -O2
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 56926
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56926&action=edit
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -O2 testcase.c -wrapper valgrind,-q
==23272== Invalid read of size 2
==23272==at 0x2635303: contains_struct_check (tree.h:3757)
==23272==by 0x2635303: gimple_lower_bitint() (gimple-lower-bitint.cc:6585)
==23272==by 0x13C87DA: execute_one_pass(opt_pass*) (passes.cc:2646)
==23272==by 0x13C90CF: execute_pass_list_1(opt_pass*) (passes.cc:2755)
==23272==by 0x13C90E1: execute_pass_list_1(opt_pass*) (passes.cc:2756)
==23272==by 0x13C9108: execute_pass_list(function*, opt_pass*)
(passes.cc:2766)
==23272==by 0xFC9795: expand (cgraphunit.cc:1842)
==23272==by 0xFC9795: cgraph_node::expand() (cgraphunit.cc:1795)
==23272==by 0xFCAADA: expand_all_functions (cgraphunit.cc:2025)
==23272==by 0xFCAADA: symbol_table::compile() [clone .part.0]
(cgraphunit.cc:2399)
==23272==by 0xFCD657: compile (cgraphunit.cc:2312)
==23272==by 0xFCD657: symbol_table::finalize_compilation_unit()
(cgraphunit.cc:2584)
==23272==by 0x150A701: compile_file() (toplev.cc:473)
==23272==by 0xDE58FB: do_compile (toplev.cc:2150)
==23272==by 0xDE58FB: toplev::main(int, char**) (toplev.cc:2306)
==23272==by 0xDE70DA: main (main.cc:39)
==23272==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==23272== 
during GIMPLE pass: bitintlower
testcase.c: In function 'bar':
testcase.c:5:1: internal compiler error: Segmentation fault
5 | bar(unsigned _BitInt(1) c, _BitInt(401) d)
  | ^~~
0x150a21f crash_signal
/repo/gcc-trunk/gcc/toplev.cc:316
0x2635303 contains_struct_check(tree_node*, tree_node_structure_enum, char
const*, int, char const*)
/repo/gcc-trunk/gcc/tree.h:3757
0x2635303 gimple_lower_bitint
/repo/gcc-trunk/gcc/gimple-lower-bitint.cc:6585
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r14-6806-20231222122934-gcefae511ed7-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r14-6806-20231222122934-gcefae511ed7-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231222 (experimental) (GCC)