[Bug go/106266] New: Libgo fails with recent glibc

2022-07-12 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106266

Bug ID: 106266
   Summary: Libgo fails with recent glibc
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: go
  Assignee: ian at airs dot com
  Reporter: marxin at gcc dot gnu.org
  Target Milestone: ---

Fails with:

[ 4342s]
/home/abuild/rpmbuild/BUILD/gcc-13.0.0+git194331/obj-x86_64-suse-linux/./gcc/xgcc
-B/home/abuild/rpmbuild/BUILD/gcc-13.0.0+git194331/obj-x86_64-suse-linux/./gcc/
-B/usr/x86_64-suse-linux/bin/ -B/usr/x86_64-suse-linux/lib/ -isystem
/usr/x86_64-suse-linux/include -isystem /usr/x86_64-suse-linux/sys-include   
-DHAVE_CONFIG_H -I. -I../../../libgo  -I ../../../libgo/runtime
-I../../../libgo/../libffi/include -I../libffi/include -pthread
-L../libatomic/.libs  -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-O -fdump-go-spec=tmp-gen-sysinfo.go -std=gnu99 -S -o sysinfo.s
../../../libgo/sysinfo.c
[ 4342s] In file included from /usr/include/linux/fs.h:19,
[ 4342s]  from ../../../libgo/sysinfo.c:162:
[ 4342s] /usr/include/linux/mount.h:95:6: error: redeclaration of 'enum
fsconfig_command'
[ 4342s]95 | enum fsconfig_command {
[ 4342s]   |  ^~~~
[ 4342s] In file included from ../../../libgo/sysinfo.c:141:
[ 4342s] /usr/include/sys/mount.h:189:6: note: originally defined here
[ 4342s]   189 | enum fsconfig_command
[ 4342s]   |  ^~~~

It's very similar to the following libsanitizer issue:
https://github.com/llvm/llvm-project/issues/56421

[Bug go/106266] Libgo fails with recent glibc

2022-07-12 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106266

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2022-07-12
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Target Milestone|--- |13.0

Re: [Bug target/106265] RISC-V SPEC2017 507.cactu code bloat due to address generation

2022-07-12 Thread Andrew Waterman
To be clear, `li rx, 4096' isn't unsupported: it's a
very-much-supported idiom for `lui rx, 1`.


On Mon, Jul 11, 2022 at 11:45 PM rguenth at gcc dot gnu.org via
Gcc-bugs  wrote:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106265
>
> --- Comment #5 from Richard Biener  ---
> So why do we even emit unsupported 'li 4096' and leave it to the linker to
> "optimize(?)"?  At least the cost of this should be reflected - IIRC powerpc
> recently got improvements for similar cases by changing the targets rtx_cost
> hook to properly const SET from CONST_INT so that CSE doesn't leave so many
> sets from constants around.
>
> OTOH LRA rematerialization also could be the culprit, thinking rematerializing
> the constant is cheaper than spilling a register holding it.


[Bug target/106265] RISC-V SPEC2017 507.cactu code bloat due to address generation

2022-07-12 Thread andrew at sifive dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106265

--- Comment #6 from Andrew Waterman  ---
To be clear, `li rx, 4096' isn't unsupported: it's a
very-much-supported idiom for `lui rx, 1`.


On Mon, Jul 11, 2022 at 11:45 PM rguenth at gcc dot gnu.org via
Gcc-bugs  wrote:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106265
>
> --- Comment #5 from Richard Biener  ---
> So why do we even emit unsupported 'li 4096' and leave it to the linker to
> "optimize(?)"?  At least the cost of this should be reflected - IIRC powerpc
> recently got improvements for similar cases by changing the targets rtx_cost
> hook to properly const SET from CONST_INT so that CSE doesn't leave so many
> sets from constants around.
>
> OTOH LRA rematerialization also could be the culprit, thinking rematerializing
> the constant is cheaper than spilling a register holding it.

[Bug preprocessor/106267] New: #pragma GCC diagnostic ignored not preserved for a -Wattribute-alias warning

2022-07-12 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106267

Bug ID: 106267
   Summary: #pragma GCC diagnostic ignored not preserved for a
-Wattribute-alias warning
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: preprocessor
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
  Target Milestone: ---

For the following code snippet isolated from Linux kernel the ignored
-Wattribute-alias is not preserved:

$ cat posix-stubs.c
#define __diag_GCC(version, severity, s)  
\
  __diag_GCC_##version(__diag_GCC_##severity s)
#define __diag_GCC_ignore ignored
#define __diag_str1(s) #s
#define __diag_str(s) __diag_str1(s)
#define __diag(s) _Pragma(__diag_str(GCC diagnostic s))
#define __diag_GCC_8(s) __diag(s)
#define __diag_pop() __diag(pop)
#define __diag_push()   __diag(push)
#define __diag_ignore(compiler, version, option, comment) 
\
  __diag_##compiler(version, ignore, option)

#define SYSCALL_ALIAS_PROTO(a) \
__diag_push(); \
__diag_ignore(GCC, 8, "-Wattribute-alias", \
"Alias to nonimplemented syscall"); \
typeof(a) a __attribute__((alias("sys_ni_posix_timers"))); \
__diag_pop()

int sys_timer_create(int);
void sys_ni_posix_timers(void) {}

#define SYS_NI(name) SYSCALL_ALIAS_PROTO(sys_##name)
SYS_NI(timer_create)

$ gcc posix-stubs.c -c -Werror
posix-stubs.c:23:42: error: ‘sys_timer_create’ alias between functions of
incompatible types ‘int(int)’ and ‘void(void)’ [-Werror=attribute-alias=]
   23 | #define SYS_NI(name) SYSCALL_ALIAS_PROTO(sys_##name)
  |  ^~~~
posix-stubs.c:17:19: note: in definition of macro ‘SYSCALL_ALIAS_PROTO’
   17 | typeof(a) a __attribute__((alias("sys_ni_posix_timers"))); \
  |   ^
posix-stubs.c:24:1: note: in expansion of macro ‘SYS_NI’
   24 | SYS_NI(timer_create)
  | ^~
posix-stubs.c:21:6: note: aliased declaration here
   21 | void sys_ni_posix_timers(void) {}
  |  ^~~
cc1: all warnings being treated as errors

While using pre-processed input works correctly:

$ gcc posix-stubs.c -c -Werror -E > posix-stubs.i && gcc posix-stubs.i -c
-Werror

[Bug preprocessor/106267] #pragma GCC diagnostic ignored not preserved for a -Wattribute-alias warning

2022-07-12 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106267

Martin Liška  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-07-12
 Status|UNCONFIRMED |NEW

[Bug fortran/106268] New: [suboptimal] Remove unnecessary loops releated to fortran compare to ifort

2022-07-12 Thread zhongyunde at huawei dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106268

Bug ID: 106268
   Summary: [suboptimal] Remove unnecessary loops releated to
fortran compare to ifort
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhongyunde at huawei dot com
  Target Milestone: ---

For the kernel inner loop body, gcc generate an loop, while icc doesn't, see
detail in https://godbolt.org/z/G77nKnf8W.
```
DO i = 1, 1
  do l = 1, ADM_lall
rhogw_vm(:,ADM_kmin  ,l) = 0.D0
rhogw_vm(:,ADM_kmax+1,l) = 0.D0
  enddo
enddo
```

[Bug preprocessor/106267] #pragma GCC diagnostic ignored not preserved for a -Wattribute-alias warning

2022-07-12 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106267

--- Comment #1 from Martin Liška  ---
Simplified test-case:

#define SYSCALL_ALIAS_PROTO(a) \
  _Pragma ("GCC diagnostic push"); \
  _Pragma ("GCC diagnostic ignored \"-Wattribute-alias\""); \
  typeof(a) a __attribute__((alias("sys_ni_posix_timers"))); \
  _Pragma ("GCC diagnostic pop");

int sys_timer_create(int);
void sys_ni_posix_timers(void) {}

SYSCALL_ALIAS_PROTO(sys_timer_create)

[Bug preprocessor/106267] #pragma GCC diagnostic ignored not preserved for a -Wattribute-alias warning

2022-07-12 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106267

Martin Liška  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #2 from Martin Liška  ---
Oh, it's fixed on master since r13-1596-g0587cef3d7962a8b.

*** This bug has been marked as a duplicate of bug 97498 ***

[Bug preprocessor/97498] #pragma GCC diagnostic ignored "-Wunused-function" inconsistent

2022-07-12 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #5 from Martin Liška  ---
*** Bug 106267 has been marked as a duplicate of this bug. ***

[Bug preprocessor/97498] #pragma GCC diagnostic ignored "-Wunused-function" inconsistent

2022-07-12 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498

--- Comment #6 from Martin Liška  ---
Am I correct that it's not something for backport to GCC 12 or any older
version?

[Bug preprocessor/97498] #pragma GCC diagnostic ignored "-Wunused-function" inconsistent

2022-07-12 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498

--- Comment #7 from Martin Liška  ---
(In reply to Martin Liška from comment #6)
> Am I correct that it's not something for backport to GCC 12 or any older
> version?

We actually speak about 2 modified lines, so can we backport it, please?

[Bug tree-optimization/99416] s211 benchmark of TSVC is vectorized by icc and not by gcc

2022-07-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99416

--- Comment #4 from Richard Biener  ---
(In reply to Richard Biener from comment #3)
> Note it's only the outer loop that confuses us here.  With that removed we
> have
> the following because of yet another "heuristic" to disable distribution.

In fact we first analyze the whole nest but then continue to look at the inner
loop only, so this isn't really an issue.

The fusing because of shared memory refs is only because of the double use
of d[i], b[i], b[i-1] or b[i+1] are not detected as problematic for
distribution (the "same memory object" check isn't working as intended).

Fuse partitions because they have shared memory refs:
  Part 1: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 15, 16
  Part 2: 0, 1, 5, 9, 10, 11, 12, 13, 14, 15, 16

note the intersection of both partitions includes half of the stmts
(0, 1, 5, 6, 15, 16) that would be duplicated (5 is the d[i] load) while
the other half is different.

To defeat the final fusing reason we need a positive motivation, like
tracking whether we know a partition can or cannot be vectorized (or
whether we are not sure).  For the partition containing the b[i], b[i+1]
dependence distance of 1 we know we cannot vectorize (with a VF > 0).

[Bug fortran/106268] [suboptimal] Remove unnecessary loops releated to fortran compare to ifort

2022-07-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106268

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
   Last reconfirmed||2022-07-12
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I think ICC elides the outer loop, it does keep the inner loop where GCC
unrolls the inner loop, keeping the outer loop.  There's no "benchmark loop"
elimination in GCC yet, instead we rely on code sinking which, at the moment,
cannot sink the memset calls out of the outer loop.

#include 
void foo (int *mem, int n)
{
  for (int i = 0; i < 1000; ++i)
memset (mem, 0, n * sizeof (int));
}

ICC probably manages to elide the loop around the memset here.

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org
 Target|aarch64-linux-gnu   |aarch64-linux-gnu,
   ||arm-none-linux-gnueabihf

--- Comment #6 from Tamar Christina  ---
Same problem happens with Armhf when building libc.

during GIMPLE pass: vect
In file included from sha256.c:213:
./sha256-block.c: In function ‘__sha256_process_block’:
./sha256-block.c:6:1: internal compiler error: in vect_transform_loops, at
tree-vectorizer.cc:1032
6 | __sha256_process_block (const void *buffer, size_t len, struct
sha256_ctx *ctx)
  | ^~

[Bug c/45840] Enhance __builtin_object_size to return useful result when applied to T (*p)[N]

2022-07-12 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45840

Jonathan Wakely  changed:

   What|Removed |Added

 CC||redi at gcc dot gnu.org

--- Comment #2 from Jonathan Wakely  ---
For C++ I expected this to give a sensible answer:

size_t f(char (&s)[10]) { return __builtin_object_size(s, 0); }

It's true that you can explicitly cast a different type to char(&)[10] but any
attempt to access another type through that would (I think) be undefined.
Presumably __builtin_object_size is being used alongside some access ... if
not, who cares what it returns?

Obviously this example is contrived (the array size is known here) but it
matters for e.g.

std::istream& read(std::istream& in, char (&buf)[10])
{
  return in >> buf;
}

The operator>> in libstdc++ uses __builtin_object_size to avoid overflow here,
but it doesn't work because the size of char(&)[10] is not known.

If the function is called like read(std::cin, reinterpret_cast(x))
then either x is a char array with at least 10 bytes (in which case it's
harmless to treat it as a smaller array) or it's not (in which case writing up
to 10 bytes to it is undefined anyway).

By extension, maybe this should give a known size too (but maybe this is a
separate enhancement request):

size_t f(char* s) { s[9] = '\0'; return __builtin_object_size(s, 0); }

If s[9] isn't valid, we already have UB. So in the absence of anything more
precise, 10 is a reasonable lower bound for the array size.

[Bug tree-optimization/105860] [10/11/12/13 Regression] Miscompilation causing clobbered union contents since r10-918-gc56c86024f8fba0c

2022-07-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105860

--- Comment #9 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Martin Jambor
:

https://gcc.gnu.org/g:16afe2e2862f3dd93c711d7f8d436dee23c6c34d

commit r11-10144-g16afe2e2862f3dd93c711d7f8d436dee23c6c34d
Author: Martin Jambor 
Date:   Tue Jul 12 13:16:35 2022 +0200

tree-sra: Fix union handling in build_reconstructed_reference

As the testcase in PR 105860 shows, the code that tries to re-use the
handled_component chains in SRA can be horribly confused by unions,
where it thinks it has found a compatible structure under which it can
chain the references, but in fact it found the type it was looking
for elsewhere in a union and generated a write to a completely wrong
part of an aggregate.

I don't remember whether the plan was to support unions at all in
build_reconstructed_reference but it can work, to an extent, if we
make sure that we start the search only outside the outermost union,
which is what the patch does (and the extra testcase verifies).

Additionally, this commit also contains sqashed in it a backport of
b984b84cbe4bf026edef2ba37685f3958a1dc1cf which fixes the testcase
gcc.dg/tree-ssa/alias-access-path-13.c for many 32-bit targets.

gcc/ChangeLog:

2022-07-01  Martin Jambor  

PR tree-optimization/105860
* tree-sra.c (build_reconstructed_reference): Start expr
traversal only just below the outermost union.

gcc/testsuite/ChangeLog:

2022-07-01  Martin Jambor  

PR tree-optimization/105860
* gcc.dg/tree-ssa/alias-access-path-13.c: New test.
* gcc.dg/tree-ssa/pr105860.c: Likewise.

(cherry picked from commit b110e5283e368b5377e04766e4ff82cd52634208)

[Bug c++/106269] New: the "operator delete" selection does not follow c++ spec

2022-07-12 Thread chumarshal at foxmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106269

Bug ID: 106269
   Summary: the "operator delete" selection does not follow c++
spec
   Product: gcc
   Version: rust/master
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chumarshal at foxmail dot com
  Target Milestone: ---

#include  
#include 

void operator delete[]( void* ptr   ) noexcept {
 std::cout << "delete with 1 parameters==" << std::endl;
::operator delete(ptr);
}

void operator delete[]( void* ptr, std::size_t size ) noexcept {

 std::cout << "delete with 2 parameters==" << std::endl;
::operator delete(ptr);
}

int main()
{
int* p = new int[2];
p[0] = 1;
p[1] = 2;
delete[] p;
}


The result is:
delete with 1 parameters==


But it does not follow c++14 spec as follow:

C++14 Standard (ISO/IEC 14882:2014), Section 5.3.5, Paragraph 10:
"If the type is complete and if deallocation function lookup finds both a
usual deallocation function with only a pointer parameter and a usual
deallocation function with both a pointer parameter and a size parameter,
then the selected deallocation function shall be the one with two
parameters. Otherwise, the selected deallocation function shall be the
function with one parameter."

[Bug tree-optimization/106268] [suboptimal] Remove unnecessary loops releated to fortran compare to ifort

2022-07-12 Thread zhongyunde at huawei dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106268

--- Comment #2 from vfdff  ---
it seems different for the C version, see detail
https://godbolt.org/z/vc1edYKhf

in your above case, the icc also doesn't elide the outer loop.

[Bug c++/106269] the "operator delete" selection does not follow c++ spec

2022-07-12 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106269

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Jonathan Wakely  ---
This is the correct behaviour, see https://wg21.link/cwg1788 which changed the
standard.

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #7 from Richard Biener  ---
Btw, I can see

FAIL: gcc.dg/vect/vect-rounding-lceil.c (internal compiler error: in
vect_transf
orm_loops, at tree-vectorizer.cc:1032)
FAIL: gcc.dg/vect/vect-rounding-lfloor.c (internal compiler error: in
vect_transform_loops, at tree-vectorizer.cc:1032)

FAIL: gfortran.dg/g77/20010430.f   -O2  (internal compiler error: in
vect_transform_loops, at tree-vectorizer.cc:1032)

(NINT user)

when testing aarch64-linux.  Those are all this aarch64 builtin issue (and
probably also reproduce on the arm backend side).

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

--- Comment #8 from CVS Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:00eab0c654e09c8a0f1b1a3b1c7bff8764e64991

commit r13-1647-g00eab0c654e09c8a0f1b1a3b1c7bff8764e64991
Author: Richard Sandiford 
Date:   Tue Jul 12 14:09:44 2022 +0100

Add internal functions for iround etc. [PR106253]

The PR is about the aarch64 port using an ACLE built-in function
to vectorise a scalar function call, even though the ECF_* flags for
the ACLE function didn't match the ECF_* flags for the scalar call.

To some extent that kind of difference is inevitable, since the
ACLE intrinsics are supposed to follow the behaviour of the
underlying instruction as closely as possible.  Also, using
target-specific builtins has the drawback of limiting further
gimple optimisation, since the gimple optimisers won't know what
the function does.

We handle several other maths functions, including round, floor
and ceil, by defining directly-mapped internal functions that
are linked to the associated built-in functions.  This has two
main advantages:

- it means that, internally, we are not restricted to the set of
  scalar types that happen to have associated C/C++ functions

- the functions (and thus the underlying optabs) extend naturally
  to vectors

This patch takes the same approach for the remaining functions
handled by aarch64_builtin_vectorized_function.

gcc/
PR target/106253
* predict.h (insn_optimization_type): Declare.
* predict.cc (insn_optimization_type): New function.
* internal-fn.def (IFN_ICEIL, IFN_IFLOOR, IFN_IRINT, IFN_IROUND)
(IFN_LCEIL, IFN_LFLOOR, IFN_LRINT, IFN_LROUND, IFN_LLCEIL)
(IFN_LLFLOOR, IFN_LLRINT, IFN_LLROUND): New internal functions.
* internal-fn.cc (unary_convert_direct): New macro.
(expand_convert_optab_fn): New function.
(expand_unary_convert_optab_fn): New macro.
(direct_unary_convert_optab_supported_p): Likewise.
* optabs.cc (expand_sfix_optab): Pass insn_optimization_type to
convert_optab_handler.
* config/aarch64/aarch64-protos.h
(aarch64_builtin_vectorized_function): Delete.
* config/aarch64/aarch64-builtins.cc
(aarch64_builtin_vectorized_function): Delete.
* config/aarch64/aarch64.cc
(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Delete.
* config/i386/i386.cc (ix86_optab_supported_p): Handle
lround_optab.
* config/i386/i386.md (lround2):
Remove
optimize_insn_for_size_p test.

gcc/testsuite/
PR target/106253
* gcc.target/aarch64/vect_unary_1.c: Add tests for iroundf,
llround, iceilf, llceil, ifloorf, llfloor, irintf and llrint.
* gfortran.dg/vect/pr106253.f: New test.

[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-12 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org

--- Comment #9 from rsandifo at gcc dot gnu.org  
---
Fixed for aarch64.  I'll do the same thing for arm.

[Bug c++/105989] Coroutine frame space for temporaries in a co_await expression is not reused

2022-07-12 Thread michal.jankovic59 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105989

--- Comment #4 from Michal Jankovič  ---
Comment on attachment 53273
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53273
Experimental patch implementing the proposed transformation

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index edb3b706ddc..ed1ac4decaf 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1997,6 +1997,7 @@ struct local_var_info
   bool is_static;
   bool has_value_expr_p;
   location_t def_loc;
+  vec *field_access_path;
 };

 /* For figuring out what local variable usage we have.  */
@@ -2009,6 +2010,26 @@ struct local_vars_transform
   hash_map *local_var_uses;
 };

+/* Build a COMPONENT_REF chain for accessing a nested variable in the
coroutine
+   frame.  */
+static tree
+build_local_var_frame_access_expr (local_vars_transform *lvt,
+  local_var_info *local_var)
+{
+  tree access_expr = lvt->actor_frame;
+
+  for (tree path_elem_id : *local_var->field_access_path)
+{
+  tree path_elem_member = lookup_member (
+   TREE_TYPE (access_expr), path_elem_id, 1, 0, tf_warning_or_error);
+  access_expr = build3_loc (
+   lvt->loc, COMPONENT_REF, TREE_TYPE (path_elem_member),
+   access_expr, path_elem_member, NULL_TREE);
+}
+
+  return access_expr;
+}
+
 static tree
 transform_local_var_uses (tree *stmt, int *do_subtree, void *d)
 {
@@ -2040,12 +2061,7 @@ transform_local_var_uses (tree *stmt, int *do_subtree,
void *d)
  if (local_var.field_id == NULL_TREE)
continue; /* Wasn't used.  */

- tree fld_ref
-   = lookup_member (lvd->coro_frame_type, local_var.field_id,
-/*protect=*/1, /*want_type=*/0,
-tf_warning_or_error);
- tree fld_idx = build3_loc (lvd->loc, COMPONENT_REF, TREE_TYPE (lvar),
-lvd->actor_frame, fld_ref, NULL_TREE);
+ tree fld_idx = build_local_var_frame_access_expr (lvd, &local_var);
  local_var.field_idx = fld_idx;
  SET_DECL_VALUE_EXPR (lvar, fld_idx);
  DECL_HAS_VALUE_EXPR_P (lvar) = true;
@@ -3873,14 +3889,24 @@ analyze_fn_parms (tree orig)
 /* Small helper for the repetitive task of adding a new field to the coro
frame type.  */

+static void
+coro_make_frame_entry_id (tree *field_list, tree id, tree fld_type,
+ location_t loc)
+{
+  tree decl = build_decl (loc, FIELD_DECL, id, fld_type);
+  DECL_CHAIN (decl) = *field_list;
+  *field_list = decl;
+}
+
+/* Same as coro_make_frame_entry_id, but creates an identifier from string. 
*/
+
 static tree
 coro_make_frame_entry (tree *field_list, const char *name, tree fld_type,
   location_t loc)
 {
   tree id = get_identifier (name);
-  tree decl = build_decl (loc, FIELD_DECL, id, fld_type);
-  DECL_CHAIN (decl) = *field_list;
-  *field_list = decl;
+  coro_make_frame_entry_id (field_list, id, fld_type, loc);
+
   return id;
 }

@@ -3894,6 +3920,8 @@ struct local_vars_frame_data
   location_t loc;
   bool saw_capture;
   bool local_var_seen;
+  tree orig;
+  vec *field_access_path;
 };

 /* A tree-walk callback that processes one bind expression noting local
@@ -3912,6 +3940,21 @@ register_local_var_uses (tree *stmt, int *do_subtree,
void *d)

   if (TREE_CODE (*stmt) == BIND_EXPR)
 {
+  tree scope_field_id = NULL_TREE;
+  if (lvd->nest_depth != 0)
+   {
+ /* Create identifier under which fields for this bind-expression will
+be accessed.  */
+ char *scope_field_name
+   = xasprintf ("_Scope%u_%u", lvd->nest_depth, lvd->bind_indx);
+ scope_field_id = get_identifier (scope_field_name);
+ free (scope_field_name);
+
+ vec_safe_push (lvd->field_access_path, scope_field_id);
+   }
+
+  tree scope_variables = NULL_TREE;
+
   tree lvar;
   unsigned serial = 0;
   for (lvar = BIND_EXPR_VARS (*stmt); lvar != NULL;
@@ -3980,17 +4023,99 @@ register_local_var_uses (tree *stmt, int *do_subtree,
void *d)

  /* TODO: Figure out if we should build a local type that has any
 excess alignment or size from the original decl.  */
- local_var.field_id = coro_make_frame_entry (lvd->field_list, buf,
+
+ local_var.field_id = coro_make_frame_entry (&scope_variables, buf,
  lvtype, lvd->loc);
  free (buf);
  /* We don't walk any of the local var sub-trees, they won't contain
 any bind exprs.  */
+
+ local_var.field_access_path = make_tree_vector_copy (
+   lvd->field_access_path);
+ vec_safe_push (local_var.field_access_path, local_var.field_id);
}
+
+  unsigned bind_indx = lvd->bind_indx;
+  tree* parent_field_list = lvd->field_list;
+
+  /* Collect the scope structs of child bind-expressions when recursing. 
*/
+  tree child_scopes = NULL_TREE;
+

[Bug c++/66290] wrong location for -Wunused-macros

2022-07-12 Thread lhyatt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290

--- Comment #4 from Lewis Hyatt  ---
OK, I understand now why done_lexing is necessary, plenty of places call back
into libcpp after lexing, e.g. to interpret strings, and this may generate
warnings.
I think that one line patch is the way to go then.

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index b9f01a65ed7..25a3c50de8e 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1283,6 +1283,7 @@ c_common_finish (void)

   /* For performance, avoid tearing down cpplib's internal structures
  with cpp_destroy ().  */
+  done_lexing = false;
   cpp_finish (parse_in, deps_stream);

   if (deps_stream && deps_stream != out_stream && deps_stream != stdout

[Bug c++/105989] Coroutine frame space for temporaries in a co_await expression is not reused

2022-07-12 Thread michal.jankovic59 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105989

Michal Jankovič  changed:

   What|Removed |Added

  Attachment #53273|0   |1
is obsolete||

--- Comment #5 from Michal Jankovič  ---
Created attachment 53290
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53290&action=edit
Patch implementing the proposed transformation

Submitted patch implementing the proposed transformation, along with a test
case.

[Bug c++/105912] internal compiler error: in extract_call_expr, at cp/call.cc:7114

2022-07-12 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105912

Patrick Palka  changed:

   What|Removed |Added

 CC||ppalka at gcc dot gnu.org
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ppalka at gcc dot 
gnu.org

[Bug target/106270] New: [Aarch64] -mlong-calls should be provided on aarch64 for users with large applications

2022-07-12 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106270

Bug ID: 106270
   Summary: [Aarch64] -mlong-calls should be provided on aarch64
for users with large applications
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: qinzhao at gcc dot gnu.org
  Target Milestone: ---

Created attachment 53291
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53291&action=edit
tar ball for the small testing case

In aarch64 backend, aarch64_is_long_call_p always return FALSE by default, as a
result, direct calls (bl) are generated for all the calls. 

For smaller applications, generating direct calls by default for all the calls
will have good run-time performance. However, for larger applications, When the
size of the .text section exceed some limitation, linker will fail with the
following error:

 relocation truncated to fit: R_AARCH64_CALL26 against symbol 

This can be showed by the attached tar ball for a small testing case:

1. download the tar ball, and untar it;
2. cd aarch64_long_call
3. update the "GCC" in "build.sh", "assemble.sh" to your own gcc (gcc 8 and
above all have the same issue). 
4. sh do_all.sh

gcc -S bar.c foo.c
patching file bar.s
Hunk #1 succeeded at 21 (offset -2 lines).
gcc -c bar.s foo.s
ld: warning: cannot find entry symbol _start; defaulting to 1000
bar.o: In function `bar':
bar.c:(.text+0x10): relocation truncated to fit: R_AARCH64_CALL26 against
symbol `foo' defined in .text section in foo.o

[Bug target/106270] [Aarch64] -mlong-calls should be provided on aarch64 for users with large applications

2022-07-12 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106270

--- Comment #1 from qinzhao at gcc dot gnu.org ---
from Jose Marchesi:


We looked at this issue and these are our findings.

- When this problem happens: 

  When the linker (ld) fails to insert a veneer (that transforms the 
  immediate bl jumps generated by GCC into an indirect call) in range 
  with the original bl instruction. 

  This issue happens because it contains .text sections bigger than 
  ~128MiB or when the range between a bl instruction and the 
  beginning/or/end of the containing section exceeds ~128MiB. 

  Note that this also happens with linker-generated PLT sections when 
  linking shared objects, much like the veneers. 

- Proposed solutions: 

  We can think on two complementary solutions: 

  a) At the moment the aarch64 linker seems to only know how to insert a 
 veneer _after_ the section that contains the branch/call 
 instruction.  We could make the linker smarter so it knows how to 
 insert the veneer _before_ the sect on containing the branch/call 
 instruction. 

 This would make things better, but obviously would not work in all 
 cases. 

  b) So, in any case, we propose to add a new option to GCC (aarch64 
 specific) that will make GCC to generate indirect branch 
 instructions (blr) for non-PLT calls to symbols not having local 
 binding/scope: 

 -mlong-calls -> Make aarch64_is_long_call_p return true 
  -> This makes GCC to generate blr instead of bl 
 when compiling non-PIC and calls to symbol 
 references with non-local binding. 
  -> This will have an impact in performance, but 
 this will _not_ impact inter-section calls nor 
 to calls to local objects.

[Bug tree-optimization/106249] [13 Regression] ICE in check_loop_closed_ssa_def, at tree-ssa-loop-manip.cc:645

2022-07-12 Thread asolokha at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106249

--- Comment #4 from Arseny Solokha  ---
Finally, a C testcase, and w/o -funreachable-traps:

void
foo (double *arr)
{
  int i, j;

  for (i = 0; i < 4; ++i)
for (j = 0; j < 4; ++j)
  arr[j] = 0;

  for (i = 1; i < 4; ++i)
for (j = 0; j < 4; ++j)
  arr[j] = 1.0 / (i + 1);
}

% gcc-13.0.0 -O1 -floop-unroll-and-jam --param unroll-jam-min-percent=0 -c
o87rfyb9.c
during GIMPLE pass: unrolljam
o87rfyb9.c: In function 'foo':
o87rfyb9.c:2:1: internal compiler error: in check_loop_closed_ssa_def, at
tree-ssa-loop-manip.cc:645
2 | foo (double *arr)
  | ^~~
0x772a0b check_loop_closed_ssa_def
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20220710/work/gcc-13-20220710/gcc/tree-ssa-loop-manip.cc:645
0x1064e6f check_loop_closed_ssa_bb
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20220710/work/gcc-13-20220710/gcc/tree-ssa-loop-manip.cc:670
0x1066116 verify_loop_closed_ssa(bool, loop*)
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20220710/work/gcc-13-20220710/gcc/tree-ssa-loop-manip.cc:695
0x1066116 verify_loop_closed_ssa(bool, loop*)
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20220710/work/gcc-13-20220710/gcc/tree-ssa-loop-manip.cc:679
0x1068849 checking_verify_loop_closed_ssa
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20220710/work/gcc-13-20220710/gcc/tree-ssa-loop-manip.h:34
0x1068849 tree_transform_and_unroll_loop(loop*, unsigned int, tree_niter_desc*,
void (*)(loop*, void*), void*)
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20220710/work/gcc-13-20220710/gcc/tree-ssa-loop-manip.cc:1431
0x1cf56cc tree_loop_unroll_and_jam
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20220710/work/gcc-13-20220710/gcc/gimple-loop-jam.cc:595

[Bug target/106270] [Aarch64] -mlong-calls should be provided on aarch64 for users with large applications

2022-07-12 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106270

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #2 from Wilco  ---
GCC will crash well before reaching 128 MBytes of .text. So what is the real
underlying problem?

Note that GCC could split huge .text sections automatically to allow insertion
of linker veneers every 128MB. So -mlong-calls is simply an incorrect solution
for a problem that doesn't exist yet...

[Bug target/106270] [Aarch64] -mlong-calls should be provided on aarch64 for users with large applications

2022-07-12 Thread jose.marchesi at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106270

--- Comment #3 from Jose E. Marchesi  ---
Wilco: The assessment in comment 1 was extracted from an internal discussion on
an issue that is still under investigation.  We are certainly hitting a
cant-reach-the-linker-generated-veneer problem, but it is not fully clear to us
how since it is getting difficult to get proper reproducers.

In any case, the idea of splitting of the text section by the compiler is
interesting, and a much better solution than -mlong-calls since it wouldn't
involve generate unnecessary indirect branches.

But how would the back-end keep track on the size of the code it generates? 
Using insn size attributes?

[Bug target/106271] New: Bootstrap on RISC-V on Ubuntu 22.04 LTS: bits/libc-header-start.h: No such file or directory

2022-07-12 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106271

Bug ID: 106271
   Summary: Bootstrap on RISC-V on Ubuntu 22.04 LTS:
bits/libc-header-start.h: No such file or directory
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

I thought I would give the new gcc92 machine a spin and tried
bootstrapping gcc on it.

Configure was done with

../Gcc/configure --disable-multilib --prefix=$HOME
--enable-languages=c,c++,fortran

and the last few lines of output were

/home/tkoenig/trunk-bin/./gcc/xgcc -B/home/tkoenig/trunk-bin/./gcc/
-B/home/tkoenig/riscv64-unknown-linux-gnu/bin/
-B/home/tkoenig/riscv64-unknown-linux-gnu/lib/ -isystem
/home/tkoenig/riscv64-unknown-linux-gnu/include -isystem
/home/tkoenig/riscv64-unknown-linux-gnu/sys-include   -fno-checking -g -O2 -O2 
-g -O2 -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual
-Wno-format -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition 
-isystem ./include  -fPIC -g -DIN_LIBGCC2 -fbuilding-libgcc
-fno-stack-protector  -fPIC -I. -I. -I../.././gcc -I../../../Gcc/libgcc
-I../../../Gcc/libgcc/. -I../../../Gcc/libgcc/../gcc
-I../../../Gcc/libgcc/../include  -DHAVE_CC_TLS   -o _ashrdi3.o -MT _ashrdi3.o
-MD -MP -MF _ashrdi3.dep -DL_ashrdi3 -c ../../../Gcc/libgcc/libgcc2.c
-fvisibility=hidden -DHIDE_EXPORTS
In file included from ../../../Gcc/libgcc/../gcc/tsystem.h:87,
 from ../../../Gcc/libgcc/libgcc2.c:27:
/usr/include/stdio.h:27:10: fatal error: bits/libc-header-start.h: No such file
or directory
   27 | #include 
  |  ^~
In file included from ../../../Gcc/libgcc/../gcc/tsystem.h:87,
 from ../../../Gcc/libgcc/libgcc2.c:27:
/usr/include/stdio.h:27:10: fatal error: bits/libc-header-start.h: No such file
or directory
   27 | #include 
  |  ^~
compilation terminated.
compilation terminated.
In file included from ../../../Gcc/libgcc/../gcc/tsystem.h:87,
 from ../../../Gcc/libgcc/libgcc2.c:27:
/usr/include/stdio.h:27:10: fatal error: bits/libc-header-start.h: No such file
or directory
   27 | #include 
  |  ^~
compilation terminated.
make[3]: *** [Makefile:501: _negdi2.o] Error 1
make[3]: *** Waiting for unfinished jobs
make[3]: *** [Makefile:501: _lshrdi3.o] Error 1
make[3]: *** [Makefile:501: _ashldi3.o] Error 1
In file included from ../../../Gcc/libgcc/../gcc/tsystem.h:87,
 from ../../../Gcc/libgcc/libgcc2.c:27:
/usr/include/stdio.h:27:10: fatal error: bits/libc-header-start.h: No such file
or directory
   27 | #include 
  |  ^~
compilation terminated.
make[3]: *** [Makefile:501: _ashrdi3.o] Error 1

[Bug target/106270] [Aarch64] -mlong-calls should be provided on aarch64 for users with large applications

2022-07-12 Thread qing.zhao at oracle dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106270

--- Comment #4 from Qing Zhao  ---
> On Jul 12, 2022, at 1:02 PM, wilco at gcc dot gnu.org 
>  wrote:
> 
> Note that GCC could split huge .text sections automatically to allow insertion
> of linker veneers every 128MB.

Does GCC do this by default? Any option is needed for this functionality?

[Bug target/106271] Bootstrap on RISC-V on Ubuntu 22.04 LTS: bits/libc-header-start.h: No such file or directory

2022-07-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106271

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1
   Last reconfirmed||2022-07-12

--- Comment #1 from Andrew Pinski  ---
I suspect configure is not detecting multi-arch correctly or --disable-multilib
interacting with multi-arch support which causes things to be broken.
OR multi-arch support is not in the riscv backend yet.

[Bug fortran/106049] ICE in gfc_simplify_pack, at fortran/simplify.cc:6481

2022-07-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106049

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:6e9d5dfc2911e3acc6039ebfe3837e7ba4be197f

commit r13-1650-g6e9d5dfc2911e3acc6039ebfe3837e7ba4be197f
Author: Harald Anlauf 
Date:   Tue Jul 5 22:20:05 2022 +0200

Fortran: error recovery simplifying PACK with invalid arguments [PR106049]

gcc/fortran/ChangeLog:

PR fortran/106049
* simplify.cc (is_constant_array_expr): A non-zero-sized constant
array shall have a non-empty constructor.  When the constructor is
empty or missing, treat as non-constant.

gcc/testsuite/ChangeLog:

PR fortran/106049
* gfortran.dg/pack_simplify_1.f90: New test.

[Bug fortran/106049] ICE in gfc_simplify_pack, at fortran/simplify.cc:6481

2022-07-12 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106049

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from anlauf at gcc dot gnu.org ---
Fixed.

[Bug target/106270] [Aarch64] -mlong-calls should be provided on aarch64 for users with large applications

2022-07-12 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106270

--- Comment #5 from Wilco  ---
(In reply to Jose E. Marchesi from comment #3)
> Wilco: The assessment in comment 1 was extracted from an internal discussion
> on an issue that is still under investigation.  We are certainly hitting a
> cant-reach-the-linker-generated-veneer problem, but it is not fully clear to
> us how since it is getting difficult to get proper reproducers.

It is worth checking you're using a recent binutils since old ones had a bug in
the veneer code (https://sourceware.org/bugzilla/show_bug.cgi?id=25665).

You can hit offset limits easily if you use a linker script which places text
sections very far apart. As the example shows, incorrect use of alignment
directives can cause issues as well. Ideally the assembler should give a
warning if there is a text section larger than 127 MB.

> In any case, the idea of splitting of the text section by the compiler is
> interesting, and a much better solution than -mlong-calls since it wouldn't
> involve generate unnecessary indirect branches.
> 
> But how would the back-end keep track on the size of the code it generates? 
> Using insn size attributes?

Yes, GCC tracks branch ranges. CBZ and TBZ have a small range and are
automatically handled if out of range. IIRC GCC doesn't yet extend Bcc, so if a
single function is over 1MB, GCC won't be able to compile it.

[Bug target/106270] [Aarch64] -mlong-calls should be provided on aarch64 for users with large applications

2022-07-12 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106270

--- Comment #6 from Wilco  ---
(In reply to Qing Zhao from comment #4)
> > On Jul 12, 2022, at 1:02 PM, wilco at gcc dot gnu.org 
> >  wrote:
> > 
> > Note that GCC could split huge .text sections automatically to allow 
> > insertion
> > of linker veneers every 128MB.
> 
> Does GCC do this by default? Any option is needed for this functionality?

No, currently it is not able to reach this limit, but once it can, it should be
done automatically.

[Bug c/106272] New: clang build: new warning ?

2022-07-12 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106272

Bug ID: 106272
   Summary: clang build: new warning ?
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

Recent gcc builds with clang say this:

libcpp/include/line-map.h:1882:12: warning: moving a temporary object prevents
copy elision [-Wpessimizing-move]

I have little idea what that means, but the line of code is

return std::move (label_text (buffer, true));

This new warning appears about 700 times, so it might be important.

[Bug preprocessor/106272] clang build: new warning ?

2022-07-12 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106272

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2022-07-12
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Jonathan Wakely  ---
(In reply to David Binderman from comment #0)
> This new warning appears about 700 times, so it might be important.

It's not. But the move is useless and shouldn't be there.

[Bug target/106273] New: [13 Regression] wrong code with -Og -march=cascadelake (due to ANDN?)

2022-07-12 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106273

Bug ID: 106273
   Summary: [13 Regression] wrong code with -Og -march=cascadelake
(due to ANDN?)
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 53292
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53292&action=edit
reduced testcase

Output:
$ x86_64-pc-linux-gnu-gcc -Og -march=cascadelake testcase.c
$ ./a.out 
Aborted

The value is 10 instead of 5.


$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r13-1649-20220712164051-gcab411a2b4b-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/13.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r13-1649-20220712164051-gcab411a2b4b-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.0 20220712 (experimental) (GCC)

[Bug target/106273] [13 Regression] wrong code with -Og -march=cascadelake (due to ANDN?)

2022-07-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106273

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.0

[Bug preprocessor/106272] clang build: new warning ?

2022-07-12 Thread dcb314 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106272

--- Comment #2 from David Binderman  ---
I just noticed similar four lines earlier:

libcpp/include/line-map.h:1876:12: warning: moving a temporary object prevents
copy elision [-Wpessimizing-move]

Source code is

return std::move (label_text (const_cast  (buffer), false));

About 700 mentions of this one, as well.

[Bug lto/106274] New: Loss of macro tracking information with -flto

2022-07-12 Thread lhyatt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106274

Bug ID: 106274
   Summary: Loss of macro tracking information with -flto
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lhyatt at gcc dot gnu.org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

This is related to PR101551 but easier to demonstrate a testcase for:

===
#define X(p) p == 0
int f(void *) __attribute__((nonnull));
int f(void *p) {
return X(p);
}


When compiled without -flto, there is information on the macro expansion in the
diagnostics:

$ gcc -c t5.c -Wnonnull-compare
/home/lewis/t5.c: In function ‘f’:
/home/lewis/t5.c:1:16: warning: nonnull argument ‘p’ compared to NULL
[-Wnonnull-compare]
1 | #define X(p) p == 0
  |^
/home/lewis/t5.c:4:12: note: in expansion of macro ‘X’
4 | return X(p);
  |^

However, if you add -flto, you don't get the extra information:

$ gcc -c t5.c -Wnonnull-compare -flto
/home/lewis/t5.c: In function ‘f’:
/home/lewis/t5.c:1:16: warning: nonnull argument ‘p’ compared to NULL
[-Wnonnull-compare]
1 | #define X(p) p == 0
  |^

The reason is that this warning is generated after the ipa_free_lang_data pass,
and that does this:


/* If we are the LTO frontend we have freed lang-specific data already.  */
  if (in_lto_p
  || (!flag_generate_lto && !flag_generate_offload))
{
  /* Rebuild type inheritance graph even when not doing LTO to get
 consistent profile data.  */
  rebuild_type_inheritance_graph ();
  return 0;
}

...

/* Reset diagnostic machinery.  */
tree_diagnostics_defaults (global_dc);



With -flto, flag_generate_lto is true, so it doesn't return early, and proceeds
to the last line, which resets the diagnostic finalizer to
default_diagnostic_finalizer, which is not aware of virtual locations.

PR101551 is about more or less the same thing except it's the other case that
prevents this function from returning early (flag_generate_offload == true). I
am not sure to what extent they are otherwise related.

Is it possible to avoid resetting the diagnostics machinery in either of these
cases? Thanks...

[Bug target/106265] RISC-V SPEC2017 507.cactu code bloat due to address generation

2022-07-12 Thread vineet.gupta at linux dot dev via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106265

--- Comment #7 from Vineet Gupta  ---
(In reply to Richard Biener from comment #5)
> So why do we even emit unsupported 'li 4096' and leave it to the linker to
> "optimize(?)"? 

li 4096 is really a pseudo-op - LUI is used to build 32-bit constants. For this
problem whether LI or LUI is used is just a detail.

> OTOH LRA rematerialization also could be the culprit, thinking rematerializing
> the constant is cheaper than spilling a register holding it.

Thx for the pointer. I tried disabling it -fno-lra-remat and it generates
exactly same code.

[Bug preprocessor/106272] clang build: new warning ?

2022-07-12 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106272

Eric Gallager  changed:

   What|Removed |Added

   Keywords||build, diagnostic
 CC||egallager at gcc dot gnu.org

--- Comment #3 from Eric Gallager  ---
Note that GCC has its own version of -Wpessimizing-move, too... any idea why
clang's version of the flag catches it, but gcc's doesn't?

[Bug target/106271] Bootstrap on RISC-V on Ubuntu 22.04 LTS: bits/libc-header-start.h: No such file or directory

2022-07-12 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106271

--- Comment #2 from Thomas Koenig  ---
(In reply to Andrew Pinski from comment #1)
> I suspect configure is not detecting multi-arch correctly or
> --disable-multilib interacting with multi-arch support which causes things
> to be broken.
> OR multi-arch support is not in the riscv backend yet.

I get the same result without --disable-multilib.

[Bug libstdc++/106275] New: unordered_map with std::string key, std::hash, and custom equality predicate weirdness

2022-07-12 Thread cuzdav at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

Bug ID: 106275
   Summary: unordered_map with std::string key,
std::hash, and custom equality predicate
weirdness
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cuzdav at gmail dot com
  Target Milestone: ---

g++12.1 (on linux), Does not occur on GCC 11 or earlier.

std::unordered_map, CustomPred> acts
strangely such that finds do not seem to use the hasher properly, and seem to
use a linear search, invoking the equality predicate against every key.  

A custom hasher implemented in terms of std::hash fixes it, as does
using the default equality predicate.  It also does not happen with key type
of, say, "int".  I've only seen it for std::string (in my limited
experimentation.)

I added output to the predicate to indicate when it's called, and it shows
excessive calls printed when "SHOWBUG" macro is defined.



#include 
#include 
#include 
#include 

struct EqualToWrapper{
bool operator()(const std::string& key1, const std::string& key2) const {
std::cout << "equal_to(key1=" << key1 << ", key2=" << key2 << ")\n";
return std::equal_to<>{}(key1, key2);
}
};

#ifdef SHOWBUG
using UsageMap =  std::unordered_map,
EqualToWrapper>;
#else
struct MyHash : public std::hash {};
using UsageMap =  std::unordered_map;
#endif

int main() {
UsageMap m;
m.insert(std::make_pair("A", 111));
m.insert(std::make_pair("B", 222));
m.insert(std::make_pair("C", 333));
m.insert(std::make_pair("D", 444));
m.insert(std::make_pair("E", 555));
m.insert(std::make_pair("F", 666));
m.find("foo");
}



With a custom equality predicate and my derived-from-std::hash hasher, output
on g++ 12 is:

  equal_to(key1=C, key2=A)   

When run with -DSHOWBUG macro defined, output is:
  equal_to(key1=B, key2=A)
  equal_to(key1=C, key2=B)
  equal_to(key1=C, key2=A)
  equal_to(key1=D, key2=B)
  equal_to(key1=D, key2=C)
  equal_to(key1=D, key2=A)
  equal_to(key1=E, key2=B)
  equal_to(key1=E, key2=D)
  equal_to(key1=E, key2=C)
  equal_to(key1=E, key2=A)
  equal_to(key1=F, key2=E)
  equal_to(key1=F, key2=B)
  equal_to(key1=F, key2=D)
  equal_to(key1=F, key2=C)
  equal_to(key1=F, key2=A)
  equal_to(key1=foo, key2=F)
  equal_to(key1=foo, key2=E)
  equal_to(key1=foo, key2=B)
  equal_to(key1=foo, key2=D)
  equal_to(key1=foo, key2=C)
  equal_to(key1=foo, key2=A)



On Godbolt:
https://godbolt.org/z/GP5dox1qs



$ g++ -v 
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/opt/imc/gcc-12.1.0/libexec/gcc/x86_64-pc-linux-gnu/12.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-12.1.0/configure --prefix=/opt/imc/gcc-12.1.0
--enable-languages=c,c++,fortran,lto --disable-multilib
--with-build-time-tools=/build/INSTALLDIR//opt/imc/gcc-12.1.0/bin
--enable-libstdcxx-time=rt
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.1.0 (GCC)

[Bug libstdc++/106275] unordered_map with std::string key, std::hash, and custom equality predicate weirdness

2022-07-12 Thread cuzdav at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #1 from Chris Uzdavinis  ---
Created attachment 53293
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53293&action=edit
preprocessed code

[Bug tree-optimization/106246] [13 Regression] powerpc-darwin9 bootstrap fails after r13-1575-gcf3a120084e946 ICE vect_transform_loops, at tree-vectorizer.cc:1032

2022-07-12 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106246

Iain Sandoe  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Iain Sandoe  ---
indeed, fixed by r13-1603-g415d2c38edad
thanks.

[Bug libstdc++/106248] [11/12/13 Regression] operator>>std::basic_istream at boundary condition behave differently in different opt levels

2022-07-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106248

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:5ae74944af1de032d4a27fad4a2287bd3a2163fd

commit r13-1651-g5ae74944af1de032d4a27fad4a2287bd3a2163fd
Author: Jonathan Wakely 
Date:   Tue Jul 12 11:18:47 2022 +0100

libstdc++: Check for EOF if extraction avoids buffer overflow [PR106248]

In r11-2581-g17abcc77341584 (for LWG 2499) I added overflow checks to
the pre-C++20 operator>>(istream&, char*) overload.  Those checks can
cause extraction to stop after filling the buffer, where previously it
would have tried to extract another character and stopped at EOF. When
that happens we no longer set eofbit in the stream state, which is
consistent with the behaviour of the new C++20 overload, but is an
observable and unexpected change in the C++17 behaviour. What makes it
worse is that the behaviour change is dependent on optimization, because
__builtin_object_size is used to detect the buffer size and that only
works when optimizing.

To avoid the unexpected and optimization-dependent change in behaviour,
set eofbit manually if we stopped extracting because of the buffer size
check, but had reached EOF anyway. If the stream's rdstate() != goodbit
or width() is non-zero and smaller than the buffer, there's nothing to
do. Otherwise, we filled the buffer and need to check for EOF, and maybe
set eofbit.

The new check is guarded by #ifdef __OPTIMIZE__ because otherwise
__builtin_object_size is useless. There's no point compiling and
emitting dead code that can't be eliminated because we're not
optimizing.

We could add extra checks that the next character in the buffer is not
whitespace, to detect the case where we stopped early and prevented a
buffer overflow that would have happened otherwise. That would allow us
to assert or set badbit in the stream state when undefined behaviour was
prevented. However, those extra checks would increase the size of the
function, potentially reducing the likelihood of it being inlined, and
so making the buffer size detection less reliable. It seems preferable
to prevent UB and silently truncate, rather than miss the UB and allow
the overflow to happen.

libstdc++-v3/ChangeLog:

PR libstdc++/106248
* include/std/istream [C++17] (operator>>(istream&, char*)):
Set eofbit if we stopped extracting at EOF.
*
testsuite/27_io/basic_istream/extractors_character/char/pr106248.cc:
New test.
*
testsuite/27_io/basic_istream/extractors_character/wchar_t/pr106248.cc:
New test.

[Bug libstdc++/106248] [11/12 Regression] operator>>std::basic_istream at boundary condition behave differently in different opt levels

2022-07-12 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106248

Jonathan Wakely  changed:

   What|Removed |Added

  Known to fail||12.1.0
Summary|[11/12/13 Regression]   |[11/12 Regression]
   |operator>>std::basic_istrea |operator>>std::basic_istrea
   |m at boundary condition |m at boundary condition
   |behave differently in   |behave differently in
   |different opt levels|different opt levels

--- Comment #9 from Jonathan Wakely  ---
Fixed on trunk only so far.

[Bug libstdc++/106275] unordered_map with std::string key, std::hash, and custom equality predicate weirdness

2022-07-12 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #2 from Jonathan Wakely  ---
The difference in behaviour is due to r12-6272-ge3ef832a9e8d6a

[Bug libstdc++/106275] unordered_map with std::string key, std::hash, and custom equality predicate weirdness

2022-07-12 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #3 from Jonathan Wakely  ---
  template
struct _Hashtable_hash_traits
{
  static constexpr std::size_t
  __small_size_threshold() noexcept
  { return std::__is_fast_hash<_Hash>::value ? 0 : 20; }
};

This new trait determines whether to just do a linear search in a small
container:

  template
auto
_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
   _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
find(const key_type& __k)
-> iterator
{
  if (size() <= __small_size_threshold())
{
  for (auto __it = begin(); __it != end(); ++__it)
if (this->_M_key_equals(__k, *__it._M_cur))
  return __it;
  return end();
}

  __hash_code __code = this->_M_hash_code(__k);
  std::size_t __bkt = _M_bucket_index(__code);
  return iterator(_M_find_node(__bkt, __k, __code));
}

The __is_fast_hash trait is true for std::hash but false for your
custom hash function.

[Bug libstdc++/106275] unordered_map with std::string key, std::hash, and custom equality predicate weirdness

2022-07-12 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #4 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #3)
> The __is_fast_hash trait is true for std::hash but false for
> your custom hash function.

Oops, I mean the other way around! std::hash is considered slow,
your custom hash function is not.

[Bug libstdc++/106275] unordered_map with std::string key, std::hash, and custom equality predicate weirdness

2022-07-12 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #5 from Jonathan Wakely  ---
This "bug" (i.e. doing a linear search for small containers using slow hash
functions) makes the find operation more than twice as fast:

BM_std_hash  4.67 ns 4.66 ns150610579
BM_custom_hash   12.0 ns 12.0 ns 57529241

See PR 68303 for more details.

[Bug preprocessor/106272] clang build: new warning ?

2022-07-12 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106272

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org

--- Comment #4 from Marek Polacek  ---
(In reply to Eric Gallager from comment #3)
> Note that GCC has its own version of -Wpessimizing-move, too... any idea why
> clang's version of the flag catches it, but gcc's doesn't?

No, but I'm going to reduce libcpp/line-map.ii to create a testcase and file a
bug and since it's my warning, maybe even fix it.

[Bug preprocessor/106272] clang build: new warning ?

2022-07-12 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106272

Marek Polacek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #5 from Marek Polacek  ---
And while at it, why don't I fix this one.

[Bug c++/106276] New: Missing -Wpessimizing-move warning

2022-07-12 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106276

Bug ID: 106276
   Summary: Missing -Wpessimizing-move warning
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mpolacek at gcc dot gnu.org
  Target Milestone: ---

Looks like we're missing a warning on the std::move line here:

namespace std {
template  _Tp &&move(_Tp &&);
}
char take_buffer;
struct label_text {
  label_text take() { return std::move(label_text(&take_buffer)); }
  label_text(char *);
};

$ xclang++ -c -Wall -W line-map.ii
line-map.ii:6:30: warning: moving a temporary object prevents copy elision
[-Wpessimizing-move]
  label_text take() { return std::move(label_text(&take_buffer)); }
 ^
line-map.ii:6:30: note: remove std::move call here
  label_text take() { return std::move(label_text(&take_buffer)); }
 ^~~
$ xg++ -c -Wall -W line-map.ii
$

[Bug preprocessor/106272] clang build: new warning ?

2022-07-12 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106272

Marek Polacek  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=106276

--- Comment #6 from Marek Polacek  ---
(In reply to Marek Polacek from comment #4)
> (In reply to Eric Gallager from comment #3)
> > Note that GCC has its own version of -Wpessimizing-move, too... any idea why
> > clang's version of the flag catches it, but gcc's doesn't?
> 
> No, but I'm going to reduce libcpp/line-map.ii to create a testcase and file
> a bug and since it's my warning, maybe even fix it.

Bug 106276.

[Bug middle-end/106277] New: missed-optimization: redundant movzx

2022-07-12 Thread jl1184 at duke dot edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106277

Bug ID: 106277
   Summary: missed-optimization: redundant movzx
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jl1184 at duke dot edu
  Target Milestone: ---

I came across this when examining a loop that runs slower than I expected. It
involves explicit and implicit conversions between 8-bit and 32/64-bit values,
and as I looked through the generated assembly using Godbolt compiler explorer,
I found lots of movzx instructions that don't seem to break dependency or play
a role in correctness, not to mention many use the same register like "movzx
eax al", which cannot be eliminated.

I then tried some simple examples on Godbolt with X86-64 GCC 12.1, and found
that this behavior is persistent and easily reproducible, even when I specify
"-march=skylake". Here's an example:

#include 

int add2bytes(uint8_t* a, uint8_t* b) {
return uint8_t(*a + *b);
}
gcc -O3 gives:
add2bytes(unsigned char*, unsigned char*):
movzx   eax, BYTE PTR [rsi]
add al, BYTE PTR [rdi]
movzx   eax, al
ret

The first movzx here breaks dependency on old eax value, but what is the second
movzx doing? I don't think there's any dependency it can break, and it
shouldn't affect the result either.

I also asked this on Stack Overflow and [Peter Cordes] has a great response
(https://stackoverflow.com/a/72953035/14730360) explaining how this extra movzx
is bad for the vast majority of X86-64 processors. IMHO newer versions of GCC
should give newer processors more weight in performance tradeoff. Probably
-mtune=generic in a later GCC shouldn't care about P6-family partial-register
stalls. Practically there should be so few still using those CPUs to run latest
compiled softwares.

Godbolt link with code for examples: https://godbolt.org/z/4n6ezaav7
Here's another example closer to what I was originally examining:

int foo(uint8_t* a, uint8_t i, uint8_t j) {
return a[a[i] | a[j]];
}
gcc -O3 gives:
foo(unsigned char*, unsigned char, unsigned char):
movzx   esi, sil
movzx   edx, dl
movzx   eax, BYTE PTR [rdi+rsi]
or  al, BYTE PTR [rdi+rdx]
movzx   eax, al
movzx   eax, BYTE PTR [rdi+rax]
ret

As was discussed in the Stack Overflow post, the first 2 movzx should be
changed to use different registers so that some CPUs can have the benefit from
mov elimination.

The "movzx   eax, al" just seems unnecessary. The upper bits of RAX should
already be cleared, and the dependency of RAX on the "or" is not something that
"movzx eax al" can break. So I think it's better to just do "movzx   eax, byte
ptr [rdi + rax]" after the "or". Or maybe even better, just use "mov   eax,
byte ptr [rdi + rax]" since EAX should already be free and cleaned in upper
bits at this point.

[Bug tree-optimization/106237] [13 regression] serveral tests begin ICEing starting with r13-1575-gcf3a120084e946

2022-07-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106237

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED

--- Comment #3 from Richard Biener  ---
(In reply to seurer from comment #2)
> They use whichever mcpu matches the machine.
> 
> The ICEs are fixed but there is a different problem introduced with your fix
> g:79f18ac6b7ab7744fcf8937ea4bc0c40f3efc629, r13-1599-g79f18ac6b7ab77

That just exposed what previously ICEd I think.

> make  -k check-gcc RUNTESTFLAGS="powerpc.exp=gcc.target/powerpc/pr56605.c"
> FAIL: gcc.target/powerpc/pr56605.c scan-rtl-dump-times combine
> "\\(compare:CC \\((?:and|zero_extend):(?:[SD]I) \\((?:sub)?reg:[SD]I" 1
> 
> which might just be the test case needing updating I suppose.  It occurs on
> the same machines as the original problem.  It still fails with current
> trunk.

I can see this with a cross to ppc64le as well, the pattern matches two times.
It's not clear to me what the testcase intends to test - it lacks a comment :/

c3d2600cfb476^ also exhibits this problem, so does d2a89809^ so this
problem must exist for longer time and it seems unrelated to this issue at
hand.  Can you properly bisect and open a different bugreport for this?

[Bug lto/106274] Loss of macro tracking information with -flto

2022-07-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106274

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-07-13
Version|unknown |12.1.0
   Keywords||diagnostic, lto
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
I think the point is that in free-lang-data we are bulldozering over data
structures that the frontend might not be happy about (in the attempt to make
the streamed IL small), so we try to reset all callbacks into frontend code
that might crash.

I'm not sure to what extent this is still required with respect to the
diagnostic context though - you'd have to try.

There's also the old long-standing TODO to perform this "frontend scrapping"
also when not using -flto just to save on memory for the followup optimization
(but this runs into the same issue that late diagnostics then appear
"mangled").