[Bug c++/111286] [12/13/14 Regression] ICE on functional cast empty brace-init-list to const array reference

2023-09-13 Thread gayathri.gottumukkala.27 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111286

Gayathri Gottumukkala  changed:

   What|Removed |Added

 CC||gayathri.gottumukkala.27@gm
   ||ail.com

--- Comment #3 from Gayathri Gottumukkala  ---
I think this issue is related to the attempt to create a temporary array of
references, which is not allowed in C++.

In C++11 and later, you can find this information in section 8.3.2, paragraph 4
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf

According to the above document, "There shall be no references to references,
no arrays of references, and no pointers to references."

To address the compilation error without modifying the code structure, we can
make use of a temporary array of const A objects.

struct A {
A() noexcept {}
};

void foo() {
using T = const A[1];
T{};
}

[Bug libstdc++/111390] 'make check-compile' target is not useful

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111390

--- Comment #5 from Richard Biener  ---
Just to add a compile-only "override" would be useful to do bare testing of
cross compilers where no (or an incomplete) runtime is available to reduce
the amount of noise produced (you still get complaints about missing headers of
course).

Not sure if easily doable across all testsuites though.

I agree not so useful for libstdc++

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

Richard Biener  changed:

   What|Removed |Added

Version|unknown |13.1.0

--- Comment #7 from Richard Biener  ---
You might also want to update the compiler, GCC 13.1.0 is old, 13.2.0 exists
for quite a while.

[Bug libstdc++/111390] libstdc++-v3/scripts/check_compile script is not useful

2023-09-13 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111390

Jonathan Wakely  changed:

   What|Removed |Added

Summary|'make check-compile' target |libstdc++-v3/scripts/check_
   |is not useful   |compile script is not
   ||useful

--- Comment #6 from Jonathan Wakely  ---
I've updated the summary to be clear that this is about the current incarnation
of the libstdc++ feature, which has been broken since the default changed
from-std=gnu++98 to -std=gnu++14 many years ago, or even earlier.

[Bug tree-optimization/111397] Spurious warning "'({anonymous})' is used uninitialized" when calling a __returns_twice__ function (-Wuninitialized -O2)

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111397

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Last reconfirmed||2023-09-13
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
(In reply to Andrew Pinski from comment #1)
> Looks loop copy header change which allowed the warning not to happen.
> 
> The warning is about the argument of test_setjmpex. Because GCC does not
> realize __builtin_frame_address cannot jump to the test_setjmpex ...
> 
> In the case of GCC 12-13, the copy of the loop header happens during
> thread-full rather than earlier and inserts:
>   _4(ab) = _11(D);
> 
> Which is what is warned about.
> _11(D) does not get proped into the phi ...

We can't propagate because

  /* Similarly if DEST flows in from an abnormal edge then the copy cannot be
 propagated.  If we know we do not propagate into a PHI argument this
 does not apply.  */
  else if (!dest_not_phi_arg_p
   && TREE_CODE (dest) == SSA_NAME
   && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (dest))
return false;

that's still not fine-grained enough - the case we cannot propagate is
when we propagate into a PHI argument for an abnormal edge.

The diagnostic doesn't happen on trunk, I still have a patch doing the
propagation.

[Bug target/111320] RISC-V: Failed combine extend + vfwredosum

2023-09-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111320

--- Comment #1 from JuzheZhong  ---
Not only inorder reduction.


But also un-order reduction:

https://godbolt.org/z/sn5jbWPbd

#include 
int
foo (int16_t * __restrict a, int n, int * __restrict cond)
{
  int r = 0;
  for (int i = 0; i < 8; i++)
 if (cond[i])
  r += a[i];
   return r;
}

int
foo2 (int16_t * __restrict a, int n, int * __restrict cond)
{
  int r = 0;
  for (int i = 0; i < 8; i++)
  r += a[i];
   return r;
}

Failed to combine into widen reduction

[Bug c/111398] GCC should warn if a struct with flexible array member is declared static or onstack

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111398

Richard Biener  changed:

   What|Removed |Added

   Keywords||diagnostic
Version|unknown |14.0
   Severity|normal  |enhancement

--- Comment #1 from Richard Biener  ---
The C standard explicitely makes these cases valid.  I agree it's probably
unintended in most cases.

[Bug c++/111399] New: Sanitizer code generation smarter than warnings

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111399

Bug ID: 111399
   Summary: Sanitizer code generation smarter than warnings
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

Given this code :

int sign(int x) {
if (x < 0) return -1;
if (x == 0) return 0;
if (x > 0) return 1;
}


and compiled with "-O2 -Wall", gcc is unable to see that all possible cases for
"x" are covered, so it generates a "control reaches end of non-void function
[-Wreturn-type]" warning.  It would be nice if gcc could see this is a false
positive, but analysis and warnings can't be perfect.

However, if I add the flag "-fsanitize=undefined", the compiler is smart enough
to see that all cases are covered, and there is no call to
__ubsan_handle_missing_return generated.

If the sanitizer code generation can see that all cases are covered, why can't
the -Wreturn-type warning detection?  I'm guessing it comes down to the
ordering of compiler passes and therefore the level of program analysis
information available at that point.  But perhaps the -Wreturn-type pass could
be done later when the information is available?

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

--- Comment #31 from Richard Biener  ---
On GIMPLE an "undefined" operand representation would be the default definition
of an SSA name with the appropriate type.  That's a somewhat "heavy"
representation and it also doesn't fit the target hook return value nicely,
but we could handle a NULL_TREE return value from the target hook in the
way to create such SSA name.

[Bug ada/110488] [13/14 regression] legal deferred constant rejected

2023-09-13 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110488

Eric Botcazou  changed:

   What|Removed |Added

Summary|Legal deferred constant |[13/14 regression] legal
   |rejected|deferred constant rejected
   Target Milestone|--- |13.3
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-09-13
 CC||ebotcazou at gcc dot gnu.org

--- Comment #1 from Eric Botcazou  ---
This compiles up to GCC 12.

[Bug ada/110488] [13/14 regression] legal deferred constant rejected

2023-09-13 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110488

Eric Botcazou  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ebotcazou at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Eric Botcazou  ---
Investigating.

[Bug c/111400] New: Missing return sanitization only works in C++

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

Bug ID: 111400
   Summary: Missing return sanitization only works in C++
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: david at westcontrol dot com
  Target Milestone: ---

With C++ and -fsanitize=return, the code :

int foo(void) { }

generates a call to __ubsan_handle_missing_return.

For C, there is no sanitizer call - just a simple "ret" instruction.

This is, of course, because in C (unlike C++), falling off the end of a
non-void function is legal and defined behaviour, as long as caller code does
not try to use the non-existent return value.  But just like in C++, it is
almost certainly an error in the C code if control flow ever falls off the end
of a non-void function.

Could -fsanitize=return be added to C?  It should not be included by
-fsanitize=undefined in C, since the behaviour is actually allowed, but it
would still be a useful option that could be enabled individually.

[Bug tree-optimization/111397] Spurious warning "'({anonymous})' is used uninitialized" when calling a __returns_twice__ function (-Wuninitialized -O2)

2023-09-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111397

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:92ea12ea99fce546772a40b7bbc2ea850db9b1be

commit r14-3916-g92ea12ea99fce546772a40b7bbc2ea850db9b1be
Author: Richard Biener 
Date:   Wed Sep 13 09:28:34 2023 +0200

tree-optimization/111397 - missed copy propagation involving abnormal dest

The following extends the previous enhancement to copy propagation
involving abnormals.  We can easily replace abnormal uses by not
abnormal uses and only need to preserve the abnormals in PHI arguments
flowing in from abnormal edges.  This changes the may_propagate_copy
argument indicating we are not propagating into a PHI node to indicate
whether we know we are not propagating into a PHI argument from an
abnormal PHI instead.

PR tree-optimization/111397
* tree-ssa-propagate.cc (may_propagate_copy): Change optional
argument to specify whether the PHI destination doesn't flow in
from an abnormal PHI.
(propagate_value): Adjust.
* tree-ssa-forwprop.cc (pass_forwprop::execute): Indicate abnormal
PHI dest.
* tree-ssa-sccvn.cc (eliminate_dom_walker::before_dom_children):
Likewise.
(process_bb): Likewise.

* gcc.dg/uninit-pr111397.c: New testcase.

[Bug tree-optimization/111397] [12/13 Regression] Spurious warning "'({anonymous})' is used uninitialized" when calling a __returns_twice__ function (-Wuninitialized -O2)

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111397

Richard Biener  changed:

   What|Removed |Added

  Known to fail||12.3.1, 13.2.1
Summary|Spurious warning|[12/13 Regression] Spurious
   |"'({anonymous})' is used|warning "'({anonymous})' is
   |uninitialized" when calling |used uninitialized" when
   |a __returns_twice__ |calling a __returns_twice__
   |function (-Wuninitialized   |function (-Wuninitialized
   |-O2)|-O2)
   Priority|P3  |P2
   Target Milestone|--- |12.4
 Blocks||24639
  Known to work||11.4.1, 14.0

--- Comment #4 from Richard Biener  ---
Fixed on trunk, I know it works on the 13 branch, eventually will backport.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24639
[Bug 24639] [meta-bug] bug to track all Wuninitialized issues

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-09-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

--- Comment #32 from JuzheZhong  ---
(In reply to Richard Biener from comment #31)
> On GIMPLE an "undefined" operand representation would be the default
> definition of an SSA name with the appropriate type.  That's a somewhat
> "heavy" representation and it also doesn't fit the target hook return value
> nicely,
> but we could handle a NULL_TREE return value from the target hook in the
> way to create such SSA name.

Thanks Richi.

How does this special "SSA" represent in RTX or How could I recognize this is
a "undefine" value in "expand" stage ?

I wondering whether my approach (passing a scalar 0) to the ELSE value which is
easily recognized in RTL backend ("expand stage") is suitable ? 

Since you could see there will be one more move instruction inside the loop
which hurt vector performance a lot, I want to find a quick way to fix it for
now.

[Bug c++/111379] comparison between unequal pointers to void should be illegal during constant evaluation

2023-09-13 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111379

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #2 from Xi Ruoyao  ---
(In reply to Jiang An from comment #1)
> There's (or will be) a new DR CWG2749 which tentatively ready now.
> https://cplusplus.github.io/CWG/issues/2749.html
> 
> It seems that the old resolution in CWG2526 was wrong, and the comparison
> should be constexpr-friendly.
> 
> BTW I don't think there was anything specifying that "the comparison would
> have *undefined* behaviour" before CWG2526.

It is (or was) unspecified, not undefined.  And the standard explicitly
disallows "a relational operator where the result is unspecified" in
[expr.const].

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-09-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

--- Comment #33 from JuzheZhong  ---
Is it reasonable this way ?


ELSE VALUE = make_temp_ssa_name (vectype, NULL, "undefine_");

Then in the later "expand" stage:


defind_expand "cond_len_xxx"
...


if (REG_EXPR (operand) == "undefine") {
gen rvv insns with no else value
}

Is it reasonable?

Thanks.

[Bug c++/111379] comparison between unequal pointers to void should be illegal during constant evaluation

2023-09-13 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111379

--- Comment #3 from Xi Ruoyao  ---
If CWG 2749 is accepted we should just close this as WONTFIX.

[Bug middle-end/111324] More optimization about "(X * Y) / Y"

2023-09-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111324

--- Comment #4 from Jiu Fu Guo  ---
(In reply to Jiu Fu Guo from comment #3)
> A patch is posted:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629534.html
It is not for this PR. Sorry for typo.

[Bug middle-end/111324] More optimization about "(X * Y) / Y"

2023-09-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111324

--- Comment #5 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #2)
> Confirmed. 
> 
> So using the local range in this case is ok. There might be only a few times
> we don't want to use it though in match.

Agree, "get_range_query" would be more useful for most cases.


Through a quick look at match.pd, there are another two patterns that use
"get_global_range_query".

Some concerns about those patterns, so those patterns may not need to be
updated.

* (T)(A)+cst -->(T)(A+cst): I'm wondering if this transformation is really in
favor of PPC.
e.g. "return (long) x1 + 40;" could save one "extend-insn" less than "return
(long)(x1 + 40);"

* For pattern "((x * cst) + cst1) * cst2": it seems this pattern does not
affect any cases. I mean this optimization is done by other parts (before
match.pd).

[Bug tree-optimization/111387] ICE on valid code at -O2 and -O3: verify_ssa failed

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111387

--- Comment #2 from Richard Biener  ---
The main issue is that when BB SLP splits the function when we already entered
a cycle we have to go back and split at the containing cycles (recursively)
because otherwise we can end up seeing "externals" from backedges.

This particular testcase can be fixed by iterating over blocks in a better
way but the general issue remains that the defensive code in
vect_get_and_check_slp_defs relies on dominance info which doesn't work well
for irreducible regions we are faced with here.

The testcase also shows the recent honoring of ->dont_vectorize pessimizes
the cases where we applied if-conversion for loop vectorization since that
sets ->dont_vectorize on the original loop and that never gets cleared.

[Bug c/111400] Missing return sanitization only works in C++

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2023-09-13
 Ever confirmed|0   |1
 CC||jsm28 at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
Version|unknown |14.0
   Severity|normal  |enhancement

--- Comment #1 from Richard Biener  ---
Confirmed.  Note C17 disallows a return wotihout an expression for a funcion
that returns a value, not sure if that means falling off the function without a
return (value) is still OK, it at least feels inconsistent.

[Bug c++/111399] Bogus -Wreturn-type diagnostic

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111399

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Keywords||diagnostic
Summary|Sanitizer code generation   |Bogus -Wreturn-type
   |smarter than warnings   |diagnostic
   Last reconfirmed||2023-09-13
Version|unknown |14.0

--- Comment #1 from Richard Biener  ---
We do instrument the missed return but it gets later optimized away.

[Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

Bug ID: 111401
   Summary: Middle-end: Missed optimization of
MASK_LEN_FOLD_LEFT_PLUS
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

There is a case I think I missed the optimization in the loop vectorizer:

https://godbolt.org/z/x5sjdenhM

double
foo2 (double *__restrict a,
 double init,
 int *__restrict cond,
 int n)
{
for (int i = 0; i < n; i++)
  if (cond[i])
init += a[i];
return init;
}

It generate the GIMPLE IR as follows:

_60 = .SELECT_VL (ivtmp_58, 4);
...
vect__ifc__35.14_56 = .VCOND_MASK (mask__23.10_50, vect__8.13_54, { 0.0, 0.0,
0.0, 0.0 });
  _36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.14_56, { -1, -1, -1,
-1 }, _60, 0);

The mask of MASK_LEN_FOLD_LEFT_PLUS is the dummy mask {-1.-1,...-1}
I think we should forward the mask of VCOND_MASK into the
MASK_LEN_FOLD_LEFT_PLUS.

Then we can eliminate the VCOND_MASK.


I don't where is the optimal place to do the optimization.

Should be the match.pd ? or the loop vectorizer code?

Thanks.

[Bug c/111400] Missing return sanitization only works in C++

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

--- Comment #2 from David Brown  ---
(In reply to Richard Biener from comment #1)
> Confirmed.  Note C17 disallows a return wotihout an expression for a funcion
> that returns a value, not sure if that means falling off the function
> without a return (value) is still OK, it at least feels inconsistent.

This has all remained unchanged from C99 to C23 (draft), I believe, which makes
things easier!

As far as I can tell, the relevant point in the standards is 6.9.1p12,
"Function definitions", which says "Unless otherwise specified, if the } that
terminates a function is reached, and the value of the function call is used by
the caller, the behaviour is undefined".  

So while a non-void function cannot have a return statement without an
expression (6.8.6.4p1), control flow /can/ run off the terminating }.  I think
this is perhaps a concession to older pre-void C code, when a function that
does not have a return value would still be declared to return "int".

Thus I think gcc's lack of a sanitizer here is technically accurate - but not
helpful, unless you are working with 35 year old code!

[Bug target/111354] [7/10/12 regression] The instructions of the DPDK demo program are different and run time increases.

2023-09-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #5 from Hongtao.liu  ---
void
rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n)
{
__m256i ymm0, ymm1, ymm2, ymm3;

while (n >= 128) {
ymm0 = _mm256_loadu_si256((const __m256i *)(const void *)
  ((const uint8_t *)src + 0 * 32));
n -= 128;
ymm1 = _mm256_loadu_si256((const __m256i *)(const void *)
  ((const uint8_t *)src + 1 * 32));
ymm2 = _mm256_loadu_si256((const __m256i *)(const void *)
  ((const uint8_t *)src + 2 * 32));
ymm3 = _mm256_loadu_si256((const __m256i *)(const void *)
  ((const uint8_t *)src + 3 * 32));
src = (const uint8_t *)src + 128;
_mm256_storeu_si256((__m256i *)(void *)
((uint8_t *)dst + 0 * 32), ymm0);
_mm256_storeu_si256((__m256i *)(void *)
((uint8_t *)dst + 1 * 32), ymm1);
_mm256_storeu_si256((__m256i *)(void *)
((uint8_t *)dst + 2 * 32), ymm2);
_mm256_storeu_si256((__m256i *)(void *)
((uint8_t *)dst + 3 * 32), ymm3);
dst = (uint8_t *)dst + 128;
}
}

I'm curious if we can distribute the uppper as an memmove?(of course, compiler
needs to know 2 array don't alias each other.

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

--- Comment #34 from Richard Biener  ---
The ELSE value of type TYPE would be constructed like

 tree var = create_tmp_var (type);
 tree else_val = get_or_create_ssa_default_def (cfun, var);

I'm not sure const0_rtx is a good representation on RTL - how would
you distinguish that from a conditional operation on an integer vector
with else value zero?  Say for an integer division?

 for (i)
   if (f[i])
 y[i] = x[i] / z[i];
   else
 y[i] = 0;

we don't have a separate "else" value for elements cut off via 'len'
vs. elements cut off via 'mask'.

On RTL there are "special" RTXen used for this kind of stuff, like
(use:mode ..) or (clobber const0_rtx), but I'm the wrong person to
ask which one would be most appropriate for a general operand in
an otherwise generic instruction.  Maybe Richard has a guess.

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

--- Comment #35 from Richard Biener  ---
(In reply to Richard Biener from comment #34)
> The ELSE value of type TYPE would be constructed like
> 
>  tree var = create_tmp_var (type);
>  tree else_val = get_or_create_ssa_default_def (cfun, var);

Oh, and you recognize that at expansion by

  TREE_CODE (else_val) == SSA_NAAME
  && SSA_NAME_IS_DEFAULT_DEF (else_val)
  && VAR_P (SSA_NAME_VAR (else_val))

[Bug c/111400] Missing return sanitization only works in C++

2023-09-13 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

--- Comment #3 from Andreas Schwab  ---
You already have -W[error=]return-type.

[Bug c/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-09-13

--- Comment #1 from Richard Biener  ---
The vectorizer sees if-converted code like

   [local count: 955630224]:
  # init_20 = PHI <_36(8), init_12(D)(18)>
  # i_22 = PHI 
  _1 = (long unsigned int) i_22;
  _2 = _1 * 4;
  _3 = cond_15(D) + _2;
  _4 = *_3;
  _23 = _4 != 0;
  _6 = _1 * 8;
  _38 = _37 + _6;
  _7 = (double *) _38;
  _8 = .MASK_LOAD (_7, 64B, _23);
  _ifc__35 = _23 ? _8 : 0.0;
  _36 = init_20 + _ifc__35;
  i_18 = i_22 + 1;
  if (n_13(D) > i_18)

so what it produces matches up here.  There's the possibility to
modify the if-conversion handling to use a COND_ADD instead of
the COND_EXPR plus ADD, I think that would be the best thing here.
See tree-if-conv.cc:is_cond_scalar_reduction/convert_scalar_cond_reduction

I think this is also wrong code when signed zeros are involved.

[Bug middle-end/111402] New: Loop distribution fail to optimize memmove for multiple consecutive moves within a loop

2023-09-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111402

Bug ID: 111402
   Summary: Loop distribution fail to optimize memmove for
multiple consecutive moves within a loop
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: crazylht at gmail dot com
  Target Milestone: ---

cat test.c

typedef long long v4di __attribute__((vector_size(32)));

void
foo (v4di* __restrict a, v4di *b, int n)
{
  for (int i = 0; i != n; i++)
a[i] = b[i];
}

void
foo1 (v4di* __restrict a, v4di *b, int n)
{
  for (int i = 0; i != n; i+=2)
{
a[i] = b[i];
a[i+1] = b[i+1];   
}
}


gcc -O2 -S test.c

GCC can optimize loop in foo to memmove, but not for loop in foo1.
This is from PR111354

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-09-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

--- Comment #36 from JuzheZhong  ---
(In reply to Richard Biener from comment #34)
> The ELSE value of type TYPE would be constructed like
> 
>  tree var = create_tmp_var (type);
>  tree else_val = get_or_create_ssa_default_def (cfun, var);
> 
> I'm not sure const0_rtx is a good representation on RTL - how would
> you distinguish that from a conditional operation on an integer vector
> with else value zero?  Say for an integer division?

My current approach is that I passed scalar 0 to the ELSE VALUE.

So in the I relax the operand predicate of the cond_len else operand:

it can be either a register_operand has VECTOR_MODE or a const_int 0 (Note that
it
can't be the CONST_VECTOR).

So, I can distinguish the else operand. If it is a scalar const_int 0, it is
undefine. Otherwise, it is always a register operand with a vector mode.

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-09-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

--- Comment #37 from JuzheZhong  ---
(In reply to Richard Biener from comment #35)
> (In reply to Richard Biener from comment #34)
> > The ELSE value of type TYPE would be constructed like
> > 
> >  tree var = create_tmp_var (type);
> >  tree else_val = get_or_create_ssa_default_def (cfun, var);
> 
> Oh, and you recognize that at expansion by
> 
>   TREE_CODE (else_val) == SSA_NAAME
>   && SSA_NAME_IS_DEFAULT_DEF (else_val)
>   && VAR_P (SSA_NAME_VAR (else_val))

Oh. Sounds good. I will have a try.

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-09-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

--- Comment #38 from rguenther at suse dot de  ---
On Wed, 13 Sep 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751
> 
> --- Comment #36 from JuzheZhong  ---
> (In reply to Richard Biener from comment #34)
> > The ELSE value of type TYPE would be constructed like
> > 
> >  tree var = create_tmp_var (type);
> >  tree else_val = get_or_create_ssa_default_def (cfun, var);
> > 
> > I'm not sure const0_rtx is a good representation on RTL - how would
> > you distinguish that from a conditional operation on an integer vector
> > with else value zero?  Say for an integer division?
> 
> My current approach is that I passed scalar 0 to the ELSE VALUE.
> 
> So in the I relax the operand predicate of the cond_len else operand:
> 
> it can be either a register_operand has VECTOR_MODE or a const_int 0 (Note 
> that
> it
> can't be the CONST_VECTOR).

I see.

[Bug c/111400] Missing return sanitization only works in C++

2023-09-13 Thread david at westcontrol dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

--- Comment #4 from David Brown  ---
(In reply to Andreas Schwab from comment #3)
> You already have -W[error=]return-type.

Yes, and that is what I normally use - I am a big fan of gcc's static warnings.

Sometimes, however, there are false positives, or perhaps other reasons why the
programmer thinks it is safe to ignore the warning in a particular case.  Then
sanitizers can be a useful run-time fault-finding aid.  There's certainly a lot
of overlap in the kinds of mistakes that can be found with -Wreturn-type and
with -fsanitizer=return-type, but there are still benefits in have both.  (You
have both in C++, just not in C.)

[Bug target/111403] New: LoongArch: Wrong code with -O -mlasx -fopenmp-simd

2023-09-13 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111403

Bug ID: 111403
   Summary: LoongArch: Wrong code with -O -mlasx -fopenmp-simd
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xry111 at gcc dot gnu.org
  Target Milestone: ---

Testcase:

struct S
{
  int s;
  S () : s (0) {}
  ~S () {}
  S (const S &x) { s = x.s; }
  S &
  operator= (const S &x)
  {
s = x.s;
return *this;
  }
};

S r, a[1024], b[1024];

#pragma omp declare reduction(+ : S : omp_out.s += omp_in.s)

__attribute__ ((noipa)) void
foo (S *a, S *b)
{
#pragma omp simd reduction(inscan, + : r)
  for (int i = 0; i < 1024; i++)
{
  r.s += a[i].s;
#pragma omp scan inclusive(r)
  b[i] = r;
}
}

int
main ()
{
  S s;
  for (int i = 0; i < 1024; ++i)
{
  a[i].s = i;
  b[i].s = -1;
}
  foo (a, b);
  if (r.s != 1024 * 1023 / 2)
__builtin_abort ();
  return 0;
}

$ g++ t.cc -O -mlasx -fopenmp-simd
$ ./a.out
Aborted (core dumped)

[Bug target/111403] LoongArch: Wrong code with -O -mlasx -fopenmp-simd

2023-09-13 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111403

Xi Ruoyao  changed:

   What|Removed |Added

   Keywords||wrong-code
 CC||chenglulu at loongson dot cn,
   ||chenxiaolong at loongson dot cn
 Target||loongarch*-*-*

--- Comment #1 from Xi Ruoyao  ---
FWIW the test case is reduced from g++.dg/vect/simd-2.cc.  And interestingly if
we remove the definition S::operator= the issue no longer happens.

[Bug middle-end/111402] Loop distribution fail to optimize memmove for multiple consecutive moves within a loop

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111402

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-09-13
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I think we have a duplicate bugreport for this.  Confirmed.

[Bug tree-optimization/111387] ICE on valid code at -O2 and -O3: verify_ssa failed

2023-09-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111387

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:04238615bba435f0b0ca7b263ad2c6bdb596e865

commit r14-3920-g04238615bba435f0b0ca7b263ad2c6bdb596e865
Author: Richard Biener 
Date:   Wed Sep 13 11:04:31 2023 +0200

tree-optimization/111387 - BB SLP and irreducible regions

When we split an irreducible region for BB vectorization analysis
the defensive handling of external backedge defs in
vect_get_and_check_slp_defs doesn't work since that relies on
dominance info to identify a backedge.  The testcase also shows
we are iterating over the function in a sub-optimal way which is
why we split the irreducible region in the first place.  The fix
is to mark backedges and use EDGE_DFS_BACK to identify them and
to use the region RPO compute which can produce a RPO order keeping
cycles in a better order (and as side effect marks backedges).

PR tree-optimization/111387
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Check
EDGE_DFS_BACK when doing BB vectorization.
(vect_slp_function): Use rev_post_order_and_mark_dfs_back_seme
to compute RPO and mark backedges.

* gcc.dg/torture/pr111387.c: New testcase.

[Bug tree-optimization/111387] ICE on valid code at -O2 and -O3: verify_ssa failed

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111387

Richard Biener  changed:

   What|Removed |Added

  Known to work||14.0

--- Comment #4 from Richard Biener  ---
Fixed for trunk.  The issue is latent but more difficult to trigger on the
branch(es), a change less likely to change code generation would be to
call mark_dfs_back_edges () and not change the iteration order.

[Bug tree-optimization/111345] `X % Y is smaller than Y.` pattern could be simpified

2023-09-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111345

--- Comment #2 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:635a34e2be67d709088c31573732dfdf733e4cec

commit r14-3921-g635a34e2be67d709088c31573732dfdf733e4cec
Author: Andrew Pinski 
Date:   Tue Sep 12 10:43:23 2023 -0700

MATCH: Simplify `(X % Y) < Y` pattern.

This merges the two patterns to catch
`(X % Y) < Y` and `Y > (X % Y)` into one by
using :c on the comparison operator.
It does not change any code generation nor
anything else. It is more to allow for better
maintainability of this pattern.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/111345
* match.pd (`Y > (X % Y)`): Merge
into ...
(`(X % Y) < Y`): Pattern by adding `:c`
on the comparison.

[Bug tree-optimization/111345] `X % Y is smaller than Y.` pattern could be simpified

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111345

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Andrew Pinski  ---
Fixed.

[Bug tree-optimization/111364] `MAX_EXPR <= a` is not optimized to `a >= b`

2023-09-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111364

--- Comment #6 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:06bedc3860d3e61857d72ffe699f79ed5c92855f

commit r14-3922-g06bedc3860d3e61857d72ffe699f79ed5c92855f
Author: Andrew Pinski 
Date:   Tue Sep 12 05:16:06 2023 +

MATCH: [PR111364] Add some more minmax cmp operand simplifications

This adds a few more minmax cmp operand simplifications which were missed
before.
`MIN(a,b) < a` -> `a > b`
`MIN(a,b) >= a` -> `a <= b`
`MAX(a,b) > a` -> `a < b`
`MAX(a,b) <= a` -> `a >= b`

OK? Bootstrapped and tested on x86_64-linux-gnu.

Note gcc.dg/pr96708-negative.c needed to updated to remove the
check for MIN/MAX as they have been optimized (correctly) away.

PR tree-optimization/111364

gcc/ChangeLog:

* match.pd (`MIN (X, Y) == X`): Extend
to min/lt, min/ge, max/gt, max/le.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/minmaxcmp-1.c: New test.
* gcc.dg/tree-ssa/minmaxcmp-2.c: New test.
* gcc.dg/pr96708-negative.c: Update testcase.
* gcc.dg/pr96708-positive.c: Add comment about `return 0`.

[Bug tree-optimization/111364] `MAX_EXPR <= a` is not optimized to `a >= b`

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111364

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #7 from Andrew Pinski  ---
Fixed.

[Bug c/111153] RISC-V: Incorrect Vector cost model for reduction

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53

--- Comment #2 from Robin Dapp  ---
With the current trunk we don't spill anymore:

(VLS)
.L4:
vle32.v v2,0(a5)
vadd.vv v1,v1,v2
addia5,a5,16
bne a5,a4,.L4

Considering just that loop I'd say costing works as designed.  Even though the
epilog and boilerplate code seems "crude" the main loop is as short as it can
be and is IMHO preferable.

.L3:
vsetvli a5,a1,e32,m1,tu,ma
sllia4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3

This has 6 instructions (disregarding the jump) and can't be faster than the 3
instructions for the VLS loop.  Provided we iterate often enough the VLS loop
should always be a win.

Regarding "looking slow" - I think ideally we would have the VLS loop followed
directly by the VLA loop for the residual iterations and next to no additional
statements.  That would require changes in the vectorizer, though.

In total: I think the current behavior is reasonable.

[Bug c/111153] RISC-V: Incorrect Vector cost model for reduction

2023-09-13 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53

--- Comment #3 from JuzheZhong  ---
(In reply to Robin Dapp from comment #2)
> With the current trunk we don't spill anymore:
> 
> (VLS)
> .L4:
>   vle32.v v2,0(a5)
>   vadd.vv v1,v1,v2
>   addia5,a5,16
>   bne a5,a4,.L4
> 
> Considering just that loop I'd say costing works as designed.  Even though
> the epilog and boilerplate code seems "crude" the main loop is as short as
> it can be and is IMHO preferable.
> 
> .L3:
> vsetvli a5,a1,e32,m1,tu,ma
> sllia4,a5,2
> sub a1,a1,a5
> vle32.v v2,0(a0)
> add a0,a0,a4
> vadd.vv v1,v2,v1
> bne a1,zero,.L3
> 
> This has 6 instructions (disregarding the jump) and can't be faster than the
> 3 instructions for the VLS loop.  Provided we iterate often enough the VLS
> loop should always be a win.
> 
> Regarding "looking slow" - I think ideally we would have the VLS loop
> followed directly by the VLA loop for the residual iterations and next to no
> additional statements.  That would require changes in the vectorizer, though.
> 
> In total: I think the current behavior is reasonable.

Oh. I see. I just checked it now.
.L4:
vle32.v v2,0(a5)
addia5,a5,16
vadd.vv v1,v1,v2
bne a5,a4,.L4
lui a4,%hi(.LC0)
lui a5,%hi(.LC1)
addia4,a4,%lo(.LC0)
vlm.v   v0,0(a4)
addia5,a5,%lo(.LC1)
andia1,a1,-4
vmv1r.v v2,v3
vlm.v   v4,0(a5)
vcompress.vmv2,v1,v0
vmv1r.v v0,v4
vadd.vv v1,v2,v1
vcompress.vmv3,v1,v0
vadd.vv v3,v3,v1
vmv.x.s a0,v3
sext.w  a0,a0
beq a3,a1,.L12

It seems that the codegen will be even better if we support VLS mode
reduction.

I aggree that we first take VLS reduction choice then move to VLA reduction
choice.

But I wonder ARM SVE doesn't use this approach since they also has VLS mode
(NEON/ADVSIMD).

[Bug c/111153] RISC-V: Incorrect Vector cost model for reduction

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53

--- Comment #4 from Robin Dapp  ---
Yes, with VLS reduction this will improve.

On aarch64 + sve I see
loop inside costs: 2
This is similar to our VLS costs.

And their loop is indeed short:

ld1wz30.s, p7/z, [x0, x2, lsl 2]
add x2, x2, x3
add z31.s, p7/m, z31.s, z30.s
whilelo p7.s, w2, w1
b.any   .L3

Not much to be squeezed out with a VLS approach.  I guess that's why.

[Bug target/105928] [AArch64] 64-bit constants with same high/low halves can use ADD lsl 32 (-Os at least)

2023-09-13 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105928

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |wilco at gcc dot gnu.org

--- Comment #2 from Wilco  ---
Shifted logical operations are single cycle on all recent cores.

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #6 from Richard Biener  ---
So the issue is that forwprop & folding has a hard time in cleaning up dead
code afterwards but it would also benefit from doing that more aggressively
(and early) because of single_use () and friends.

I'm thinking of hooking into update_stmt to discover candidates for
simple-dce-from-worklist (likely not early and aggressive enough though).

[Bug jit/111396] Segfault when using -flto with libgccjit

2023-09-13 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111396

David Malcolm  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-09-13
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from David Malcolm  ---
Thanks; I can reproduce the ICE with trunk, both with and without the patch you
reference.  Taking a look...

[Bug libstdc++/111390] libstdc++-v3/scripts/check_compile script is not useful

2023-09-13 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111390

--- Comment #7 from joseph at codesourcery dot com  ---
Stubbing out execution of tests can be done with a suitable board file (a 
board file to stub out linking as well is a bit more complicated).

https://gcc.gnu.org/pipermail/gcc/2017-September/224422.html

[Bug target/111404] New: [AArch64] 128-bit __sync_val_compare_and_swap is not atomic

2023-09-13 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111404

Bug ID: 111404
   Summary: [AArch64] 128-bit __sync_val_compare_and_swap is not
atomic
   Product: gcc
   Version: 8.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wilco at gcc dot gnu.org
  Target Milestone: ---

This compiles

__int128 f(__int128 *p, __int128 *q, __int128 x)
{
  return __sync_val_compare_and_swap (p, *q, x);
}

into:

f:
ldp x6, x7, [x1]
mov x4, x0
.L3:
ldxpx0, x1, [x4]
cmp x0, x6
ccmpx1, x7, 0, eq
bne .L4
stlxp   w5, x2, x3, [x4]
cbnzw5, .L3
.L4:
dmb ish
ret

This means if the compare fails, we return the value loaded via LDXP. However
unless the STXP succeeds, this returned value is not single-copy atomic.

So on failure we still need to execute STLXP.

[Bug c/111405] New: Problem with incorrect optimizion for "constexpr" function with possible overflow

2023-09-13 Thread 3180104919 at zju dot edu.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111405

Bug ID: 111405
   Summary: Problem with incorrect optimizion for "constexpr"
function with possible overflow
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: 3180104919 at zju dot edu.cn
  Target Milestone: ---

Created attachment 55891
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55891&action=edit
A demo file contains a funciton that will be wrongly optimized using -O2

I happened to find this problem when I did the CSAPP lab.

int isTmax(int x) {
  // make it all of 1
  // it's quite strange that the results of x + 1 + x and x + x + 1 are
different
  int c = x + x + 1;
  // check if it's all of 1
  int flag_all_ones = !(~c);
  // avoid -1
  int flag_not_neg1 = !!(x + 1);
  return flag_all_ones & flag_not_neg1;
}

This function will be incorrectly optimized to return zero only with "-O2"
compiler flag. But in fact isTmax(0x7fff) should return 1. Here's the
disassembly code using coredump:

12ac :
  // check if it's all of 1
  int flag_all_ones = !(~c);
  // avoid -1
  int flag_not_neg1 = !!(x + 1);
  return flag_all_ones & flag_not_neg1;
}
12ac:   b8 00 00 00 00  mov$0x0,%eax
12b1:   c3  ret

This function can be correctly compiled with no compiler optimization (-O0). 
And this behaviour always occurs using the latest 2 version gcc compiler (from
11.0 to 12.0). But using clang or msvc, everything works well. 

Thank you for your time.

[Bug other/111406] New: libiberty build produces errors with CC=clang, unsupported option '-print-multi-os-directory'

2023-09-13 Thread dilfridge at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111406

Bug ID: 111406
   Summary: libiberty build produces errors with CC=clang,
unsupported option '-print-multi-os-directory'
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dilfridge at gentoo dot org
  Target Milestone: ---

This is a clone of https://bugs.gentoo.org/913750

With CC=clang, the build of libiberty (as part of gnu binutils)
produces errors

clang-16: error: unsupported option '-print-multi-os-directory'
clang-16: error: no input files

However, the build continues apparently fine...

This stems from libiberty/Makefile.am:

385 # This is tricky.  Even though CC in the Makefile contains
386 # multilib-specific flags, it's overridden by FLAGS_TO_PASS from the
387 # default multilib, so we have to take CFLAGS into account as well,
388 # since it will be passed the multilib flags.
389 MULTIOSDIR = `$(CC) $(CFLAGS) -print-multi-os-directory`

[Bug c/111405] Problem with incorrect optimizion for "constexpr" function with possible overflow

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111405

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Andrew Pinski  ---
Signed integer overflow is undefined behavior. 

Use -fwrapv or unsigned to do the addition to get the behavior you want.

[Bug other/111406] libiberty build produces errors with CC=clang, unsupported option '-print-multi-os-directory'

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111406

--- Comment #1 from Andrew Pinski  ---
>This stems from libiberty/Makefile.am:

You mean Makefile.in, libiberty does not use automake.

[Bug c/111405] Problem with incorrect optimizion for "constexpr" function with possible overflow

2023-09-13 Thread 3180104919 at zju dot edu.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111405

--- Comment #2 from Wang Chenyu <3180104919 at zju dot edu.cn> ---
(In reply to Andrew Pinski from comment #1)
> Signed integer overflow is undefined behavior. 
> 
> Use -fwrapv or unsigned to do the addition to get the behavior you want.

I see.. Thank you for your explanation

[Bug other/111406] libiberty build produces errors with CC=clang, unsupported option '-print-multi-os-directory'

2023-09-13 Thread dilfridge at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111406

--- Comment #2 from Andreas K. Huettel  ---
Indeed, sorry

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=libiberty/Makefile.in#l388

[Bug tree-optimization/111294] [14 Regression] Missed Dead Code Elimination since r14-573-g69f1a8af45d

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294

--- Comment #7 from Andrew Pinski  ---
Created attachment 55892
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55892&action=edit
version of using simple_dce_from_worklist in forwprop

This is a version of using simple_dce_from_worklist in forwprop I had tried at
one point, but I don't remember why I did finish up this patch.

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-13 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #8 from AK  ---

> this does seem like a HW issue. Are you sure you have a decent RISCV machine 
> without any memory issues?
> I suspect ninja is building with all of the cores which pushes the memory 
> usage high.

possible. I have the https://sipeed.com/licheepi4a (licheepi 4a board)


> Maybe lower the clock speed of the CPU you are using.

will do. thanks

[Bug c/111400] Missing return sanitization only works in C++

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111400

--- Comment #5 from Andrew Pinski  ---
To be able to detect this, an ABI change would be needed as you need to pass
back if the function fell through or not. Now for (non-address taken) static
functions that should be ok. The check should happen on the caller side rather
than the callee side as it is only undefined if the caller uses the value ...

[Bug tree-optimization/111407] New: ICE: SSA corruption due to widening_mul opt on conflict across an abnormal edge

2023-09-13 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111407

Bug ID: 111407
   Summary: ICE: SSA corruption due to widening_mul opt on
conflict across an abnormal edge
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: qinzhao at gcc dot gnu.org
  Target Milestone: ---

this bug was originally reported against GCC8.5 with profiling feedback. 
there were multiple similar failures due to this issue for our large
application. 

Although we reduced the testing case to a very small size, and changed the
variable names. the failure can only be repeated with -fprofile-use and the
.gcda files. As a result, we cannot expose the testing case.

With the small testing case, and debugging into GCC8, I finally locate the
issue is:

this is a bug in tree-ssa-math-opts.cc, when applying the widening mul
optimization, 
The compiler needs to check whether the operand is in a ABNORMAL PHI, if YES,
we should avoid the transformation.

the following patch against GCC8 can fix the failure very well:

diff -u -r -N -p gcc-8.5.0-20210514-org/gcc/tree-ssa-math-opts.c
gcc-8.5.0-20210514/gcc/tree-ssa-math-opts.c
--- gcc-8.5.0-20210514-org/gcc/tree-ssa-math-opts.c 2023-09-11
21:04:17.891403319 +
+++ gcc-8.5.0-20210514/gcc/tree-ssa-math-opts.c 2023-09-13 15:35:44.962336530
+
@@ -2346,6 +2346,14 @@ convert_mult_to_widen (gimple *stmt, gim
if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
return false;

+ /* if any one of rhs1 and rhs2 is subjust to abnormal coalescing
+ * avoid the tranform. */ 
+ if ((TREE_CODE (rhs1) == SSA_NAME
+ && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs1))
+ || (TREE_CODE (rhs2) == SSA_NAME
+ && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs2)))
+ return false;
+
to_mode = SCALAR_INT_TYPE_MODE (type);
from_mode = SCALAR_INT_TYPE_MODE (type1);
if (to_mode == from_mode)

I checked the latest upstream GCC14, and found that the function
"convert_mult_to_widen" has the same issue, need to be patched as well.

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

Robin Dapp  changed:

   What|Removed |Added

 CC||rdapp at gcc dot gnu.org

--- Comment #2 from Robin Dapp  ---
I played around with this a bit.  Emitting a COND_LEN in if-convert is easy:

_ifc__35 = .COND_ADD (_23, init_20, _8, init_20);

However, during reduction handling we rely on the reduction being a gimple
assign and binary operation, though so I needed to fix some places and indices
as well as the proper mask.

What complicates things a bit is that we assume that "init_20" (i.e. the
reduction def) occurs once when we have it twice in the COND_ADD.  I just
special cased that for now.  Is this the proper thing to do?

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 23c6e8259e7..e99add3cf16 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
 static bool
 fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
 {
-  if (code == PLUS_EXPR)
+  if (code == PLUS_EXPR || code == IFN_COND_ADD)
 {
   *reduc_fn = IFN_FOLD_LEFT_PLUS;
   return true;
@@ -4106,8 +4106,11 @@ vect_is_simple_reduction (loop_vec_info loop_info,
stmt_vec_info phi_info,
   return NULL;
 }

-  nphi_def_loop_uses++;
-  phi_use_stmt = use_stmt;
+  if (use_stmt != phi_use_stmt)
+   {
+ nphi_def_loop_uses++;
+ phi_use_stmt = use_stmt;
+   }

@@ -7440,6 +7457,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   if (i == STMT_VINFO_REDUC_IDX (stmt_info))
continue;

+  if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
+   continue;
+

Apart from that I think what's mainly missing is making the added code nicer. 
Going to attach a tentative patch later.

[Bug c/111398] GCC should warn if a struct with flexible array member is declared static or onstack

2023-09-13 Thread tg at mirbsd dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111398

--- Comment #2 from Thorsten Glaser  ---
Right, which is why I suggested a -Wextra level option to warn about these.

Thanks!

[Bug tree-optimization/111407] ICE: SSA corruption due to widening_mul opt on conflict across an abnormal edge

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111407

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-09-13
 Status|UNCONFIRMED |WAITING

--- Comment #1 from Andrew Pinski  ---
Do you have a testcase?

[Bug tree-optimization/111407] ICE: SSA corruption due to widening_mul opt on conflict across an abnormal edge

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111407

--- Comment #2 from Andrew Pinski  ---
Also what target is this for?
I suspect aarch64 since x86_64 does not have widening multiply ...

[Bug tree-optimization/111407] ICE: SSA corruption due to widening_mul opt on conflict across an abnormal edge

2023-09-13 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111407

--- Comment #3 from qinzhao at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #1)
> Do you have a testcase?

I have, but I cannot expose it to public.
I can provide the Bad IR and Good IR if you think it's okay.

[Bug tree-optimization/111407] ICE: SSA corruption due to widening_mul opt on conflict across an abnormal edge

2023-09-13 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111407

--- Comment #4 from qinzhao at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #2)
> Also what target is this for?
> I suspect aarch64 since x86_64 does not have widening multiply ...

you are right, it's aarch64.

[Bug tree-optimization/111407] ICE: SSA corruption due to widening_mul opt on conflict across an abnormal edge

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111407

Andrew Pinski  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #5 from Andrew Pinski  ---
Testcase:
```
enum { SEND_TOFILE } __sigsetjmp();
void fclose();
void foldergets();
void sendpart_stats(int *p1, int a1, int b1) {
  int *a = p1;
  fclose();
  p1 = 0;
  long t = b1;
  if (__sigsetjmp()) {
{
  long t1 = a1;
  a1+=1;
  fclose(a1*(long)t1);
}
  }
  if (p1)
fclose();
}
```

[Bug tree-optimization/111407] ICE: SSA corruption due to widening_mul opt on conflict across an abnormal edge

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111407

--- Comment #6 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #5)
> Testcase:

The way I figured out this testcase was trial and error and starting with the
testcase from PR 69167 .

[Bug tree-optimization/94589] Optimize (i<=>0)>0 to i>0

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589

--- Comment #26 from Andrew Pinski  ---
Created attachment 55893
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55893&action=edit
Here is my idea around the patch for prototype for doing the constant prop idea

This is like the one in comment #13 but handles it in phiopt rather than
forwprop.

[Bug tree-optimization/111407] ICE: SSA corruption due to widening_mul opt on conflict across an abnormal edge

2023-09-13 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111407

--- Comment #7 from qinzhao at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #5)
> Testcase:
thanks a lot for the testing case. GCC8 failed with this, disable
tree-widening_mul fixed the failure.
and my patch for GCC8 also fixed the failure.

will test GCC14 as well.

[Bug tree-optimization/111407] ICE: SSA corruption due to widening_mul opt on conflict across an abnormal edge

2023-09-13 Thread qinzhao at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111407

--- Comment #8 from qinzhao at gcc dot gnu.org ---
the latest GCC14 has the same issue.

with the patch proposed in comment #1, the failure has been fixed.

[Bug driver/86030] specs file processing does not create response files for input directories

2023-09-13 Thread john.soo+gcc-bugzilla at arista dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86030

--- Comment #14 from John Soo  ---
> Here though it seems that you are dealing with another sort of limit which is 
> much larger (I have seen 128K being mentioned on the GH page).If this 
> somehow corrupts the command line, it wouldn't help if that command line went 
> into a response file because it would still be wrong.To my knowledge, 
> Linux-based systems don't have a command line length limitation, so I can't 
> see how a response file approach would be useful at the point where the 
> subprocess is spawned.Whether something similar can be used at an earlier 
> point to save it from the 128K limit, whatever it is, is unknown to me.

It is a much larger limit (ARG_MAX resulting in E2BIG), but it is fundamentally
the same problem. I think we should assume that the command line is correct and
still respect ARG_MAX on linux/unix systems, too. It seems to me that the
temporary response file is the best way to do this.

[Bug tree-optimization/78512] [7 Regression] r242674 miscompiles Linux kernel

2023-09-13 Thread ndesaulniers at google dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78512

Nick Desaulniers  changed:

   What|Removed |Added

 CC||ndesaulniers at google dot com

--- Comment #10 from Nick Desaulniers  ---
I'm not super happy that GCC has false-negatives when %p is encountered.  Bugs
do exist outside of the Linux kernel with the usage of %p that could be
flagged.

Clang-18 has recently added -Wno-format-overflow-non-kprintf and
-Wformat-truncation-non-kprintf to emulate this behavior in
https://github.com/llvm/llvm-project/pull/65969, which we will use in the
kernel
https://github.com/ClangBuiltLinux/linux/issues/1923#issuecomment-1718144462.

At the least, I think this behavior wrt. %p should either be documented, or
-Wno-format-overflow-non-kprintf and -Wformat-truncation-non-kprintf
implemented in GCC.

That said, this diagnostic catches real bugs!  Linus turned them off, but we
will work through fixing the instances identified towards the goal of getting
them re-enabled for the Linux kernel.
https://github.com/KSPP/linux/issues/343

[Bug c++/59526] [C++11] Defaulted special member functions don't accept noexcept if a member has a non-trivial noexcept operator in the corresponding special member function

2023-09-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59526

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Francois Dumont :

https://gcc.gnu.org/g:92456291849fe88303bbcab366f41dcd4a885ad5

commit r14-3926-g92456291849fe88303bbcab366f41dcd4a885ad5
Author: François Dumont 
Date:   Wed Aug 23 19:15:43 2023 +0200

libstdc++: [_GLIBCXX_INLINE_VERSION] Fix  friend declaration

GCC do not consider the inline namespace in friend function declarations.
This is PR c++/59526, we need to explicit this namespace.

libstdc++-v3/ChangeLog:

* include/std/format (std::__format::_Arg_store): Explicit version
namespace on make_format_args friend declaration.

[Bug tree-optimization/111408] New: [14 Regression] Wrong code at -O2/3 on x86_64-linux-gnu since r14-2866-ge68a31549d9

2023-09-13 Thread shaohua.li at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111408

Bug ID: 111408
   Summary: [14 Regression] Wrong code at -O2/3 on
x86_64-linux-gnu since r14-2866-ge68a31549d9
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: shaohua.li at inf dot ethz.ch
  Target Milestone: ---

gcc at -O2/s produced the wrong code.

Bisected to r14-2866-ge68a31549d9

Compiler explorer: https://godbolt.org/z/secjqP8ao

$ cat a.c
int printf(const char *, ...);
int a, b, c, d;
short e;
int f() {
  c = a % (sizeof(int) * 8);
  if (b & 1 << c)
return -1;
  return 0;
}
int main() {
  for (; e != 1; e++) {
int g = f();
if (g + d - 9 + d)
  continue;
for (;;)
  __builtin_abort();
  }
}
$
$ gcc -O0 a.c && ./a.out
$
$ gcc -O2 a.c && ./a.out
[2]1281121 abort  ./a.out
$

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

--- Comment #3 from Robin Dapp  ---
Several other things came up, so I'm just going to post the latest status here
without having revised or tested it.  Going to try fixing it and testing
tomorrow.

--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
 static bool
 fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
 {
-  if (code == PLUS_EXPR)
+  if (code == PLUS_EXPR || code == IFN_COND_ADD)
 {
   *reduc_fn = IFN_FOLD_LEFT_PLUS;
   return true;
@@ -4106,8 +4106,13 @@ vect_is_simple_reduction (loop_vec_info loop_info,
stmt_vec_info phi_info,
   return NULL;
 }

-  nphi_def_loop_uses++;
-  phi_use_stmt = use_stmt;
+  /* We might have two uses in the same instruction, only count them as
+one. */
+  if (use_stmt != phi_use_stmt)
+   {
+ nphi_def_loop_uses++;
+ phi_use_stmt = use_stmt;
+   }
 }

   tree latch_def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
@@ -6861,7 +6866,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   gimple **vec_stmt, slp_tree slp_node,
   gimple *reduc_def_stmt,
   tree_code code, internal_fn reduc_fn,
-  tree ops[3], tree vectype_in,
+  tree *ops, int num_ops, tree vectype_in,
   int reduc_index, vec_loop_masks *masks,
   vec_loop_lens *lens)
 {
@@ -6883,11 +6888,24 @@ vectorize_fold_left_reduction (loop_vec_info
loop_vinfo,
 gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out),
  TYPE_VECTOR_SUBPARTS (vectype_in)));

-  tree op0 = ops[1 - reduc_index];
+  /* The operands either come from a binary operation or a COND_ADD operation.
+ The former is a gimple assign and the latter is a gimple call with four
+ arguments.  */
+  gcc_assert (num_ops == 2 || num_ops == 4);
+  bool is_cond_add = num_ops == 4;
+  tree op0, opmask;
+  if (!is_cond_add)
+op0 = ops[1 - reduc_index];
+  else
+{
+  op0 = ops[2];
+  opmask = ops[0];
+  gcc_assert (!slp_node);
+}
   int group_size = 1;
   stmt_vec_info scalar_dest_def_info;
-  auto_vec vec_oprnds0;
+  auto_vec vec_oprnds0, vec_opmask;
   if (slp_node)
 {
   auto_vec > vec_defs (2);
@@ -6903,9 +6921,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
 op0, &vec_oprnds0);
   scalar_dest_def_info = stmt_info;
+  if (is_cond_add)
+   {
+ vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
+opmask, &vec_opmask);
+ gcc_assert (vec_opmask.length() == 1);
+   }
 }

-  tree scalar_dest = gimple_assign_lhs (scalar_dest_def_info->stmt);
+  gimple *sdef = scalar_dest_def_info->stmt;
+  tree scalar_dest = is_gimple_call (sdef)
+  ? gimple_call_lhs (sdef)
+  : gimple_assign_lhs (scalar_dest_def_info->stmt);
   tree scalar_type = TREE_TYPE (scalar_dest);
   tree reduc_var = gimple_phi_result (reduc_def_stmt);

@@ -6945,7 +6972,11 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   i, 1);
  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
(loop_vinfo);
  bias = build_int_cst (intQI_type_node, biasval);
- mask = build_minus_one_cst (truth_type_for (vectype_in));
+ /* If we have a COND_ADD take its mask.  Otherwise use {-1, ...}.  */
+ if (is_cond_add)
+   mask = vec_opmask[0];
+ else
+   mask = build_minus_one_cst (truth_type_for (vectype_in));
}

   /* Handle MINUS by adding the negative.  */
@@ -7440,6 +7471,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   if (i == STMT_VINFO_REDUC_IDX (stmt_info))
continue;

+  if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
+   continue;
+
   /* There should be only one cycle def in the stmt, the one
  leading to reduc_def.  */
   if (VECTORIZABLE_CYCLE_DEF (dt))
@@ -8211,8 +8245,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   vec_num = 1;
 }

-  code_helper code = canonicalize_code (op.code, op.type);
-  internal_fn cond_fn = get_conditional_internal_fn (code, op.type);
+  code_helper code (op.code);
+  internal_fn cond_fn;
+
+  if (code.is_internal_fn ())
+{
+  internal_fn ifn = internal_fn (op.code);
+  code = canonicalize_code (conditional_internal_fn_code (ifn), op.type);
+  cond_fn = ifn;
+}
+  else
+{
+  code = canonicalize_code (op.code, op.type);
+  cond_fn = get_conditional_internal_fn (code, op.type);
+}
+
   vec_loop_masks *masks = &LOOP_

[Bug debug/111409] New: Invalid .debug_macro.dwo macro information for split DWARF

2023-09-13 Thread osandov at osandov dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111409

Bug ID: 111409
   Summary: Invalid .debug_macro.dwo macro information for split
DWARF
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: osandov at osandov dot com
  Target Milestone: ---

When using -g3 -gsplit-dwarf, the generated macro information has a couple of
issues.

I'm using the following trivial source file:

$ cat test.c
#define ZERO 0
int main(void)
{
return ZERO;
}

First, GCC emits DW_MACRO_import entries, but they always have an offset of 0:

$ gcc -g3 -gsplit-dwarf test.c
$ readelf --debug-dump=macro a-test.dwo | head -n 14
Contents of the .debug_macro.dwo section:

  Offset:  0x0
  Version: 5
  Offset size: 4
  Offset into .debug_line: 0x0

 DW_MACRO_import - offset : 0x0
 DW_MACRO_start_file - lineno: 0 filenum: 1
 DW_MACRO_start_file - lineno: 0 filenum: 2
 DW_MACRO_import - offset : 0x0
 DW_MACRO_end_file
 DW_MACRO_define_strx lineno : 1 macro : 
 DW_MACRO_end_file

Second, each macro unit is in its own .debug_macro.dwo section:

$ readelf -S -W a-test.dwo | grep -F .debug_macro
  [ 3] .debug_macro.dwo  PROGBITS b2 1e 00   E 
0   0  1
  [ 4] .debug_macro.dwo  PROGBITS d0 00059e 00 
0   0  1
  [ 5] .debug_macro.dwo  PROGBITS 00066e 1b 00 
0   0  1

As far as I can tell, the DWARF specification doesn't allow this, and tools
seem to only use either the first or last section (gdb only finds the first
one, and dwp only copies the last one into the .dwp file).

These seem to have the same underlying cause: when not using split DWARF, the
linker deduplicates units and relocates the import offsets appropriately, but
this is not possible with split DWARF. I imagine that the fix would be to not
use imports for split DWARF and only generate one macro unit per .dwo file
containing everything.

(P.S., -g3 -gdwarf-4 -gstrict-dwarf -gsplit-dwarf generates a valid
.debug_macinfo.dwo because it doesn't have a notion of imports.)

[Bug target/111408] [14 Regression] Wrong code at -O2/3 on x86_64-linux-gnu since r14-2866-ge68a31549d9

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111408

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Target||x86_64-linux-nug
   Keywords||wrong-code
  Component|tree-optimization   |target

[Bug target/111408] [14 Regression] Wrong code at -O2/3 on x86_64-linux-gnu since r14-2866-ge68a31549d9

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111408

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-09-13
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug target/111408] [14 Regression] Wrong code at -O2/3 on x86_64-linux-gnu since r14-2866-ge68a31549d9

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111408

--- Comment #2 from Andrew Pinski  ---
GCC 13.2:
sarl%cl, %eax
movld(%rip), %ecx
andl$1, %eax
andl$31, %edx
leal-9(%rcx,%rcx), %ecx
cmpl%eax, %ecx

While the trunk:
movla(%rip), %eax
movlb(%rip), %ecx
movl%eax, %edx
andl$31, %edx
btl %eax, %ecx

The trunk somehow missed the whole 2*d - 9 part ...

[Bug target/110751] RISC-V: Suport undefined value that allows VSETVL PASS use TA/MA

2023-09-13 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 CC|richard.sandiford at arm dot com   |

--- Comment #39 from rsandifo at gcc dot gnu.org  
---
(In reply to Richard Biener from comment #34)
> On RTL there are "special" RTXen used for this kind of stuff, like
> (use:mode ..) or (clobber const0_rtx), but I'm the wrong person to
> ask which one would be most appropriate for a general operand in
> an otherwise generic instruction.  Maybe Richard has a guess.
I think the best bet with existing RTL is (scratch:).  It's not an exact
fit for current usage (or for the documentation), but it's similar in spirit to
the cratch in (mem:BLK (scratch:P)) (which also isn't an exact match for the
documentation).

I don't expect this to work out of the box.  Some changes to target-independent
code will be needed.  But if we restrict this use to expanders for now, the
changes should be relatively small.  I think the main thing would be to make
maybe_legitimize_operand turn a scratch rtx into a fresh pseudo if the
predicate doesn't accept a scratch.  We'd then be restoring the semantics of an
uninitialised SSA_NAME.

If we did that, I think we could convert uninitialised SSA_NAMEs into SCRATCHes
for everything that goes through expand_fn_using_insn.  There should be no need
to restrict it to COND_* functions.

[Bug tree-optimization/106164] (a > b) & (a >= b) does not get optimized until reassoc1

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106164

--- Comment #14 from Andrew Pinski  ---
Created attachment 55894
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55894&action=edit
Third patch to support the constants that are off by one

This patch adds what I mentioned was missing in comment #9.

[Bug tree-optimization/106164] (a > b) & (a >= b) does not get optimized until reassoc1

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106164

--- Comment #15 from Andrew Pinski  ---
Created attachment 55895
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55895&action=edit
testcases for the constants off by one

[Bug target/111334] [14 regression] ICE is reported during the combine pass optimization

2023-09-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

--- Comment #20 from CVS Commits  ---
The master branch has been updated by LuluCheng :

https://gcc.gnu.org/g:9a033b9feffc9d97d5acfe8ca3cd16359f4b714b

commit r14-3974-g9a033b9feffc9d97d5acfe8ca3cd16359f4b714b
Author: Lulu Cheng 
Date:   Mon Sep 11 16:20:29 2023 +0800

LoongArch: Fix bug of 'di3_fake'.

PR target/111334

gcc/ChangeLog:

* config/loongarch/loongarch.md: Fix bug of 'di3_fake'.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr111334.c: New test.

[Bug c++/111410] New: Bogus Wdangling-reference warning with ranges pipe expression in for loop

2023-09-13 Thread hewillk at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111410

Bug ID: 111410
   Summary: Bogus Wdangling-reference warning with ranges pipe
expression in for loop
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hewillk at gmail dot com
  Target Milestone: ---

#include 
#include 
#include 

int main()
{
  std::vector v{1, 2, 3, 4, 5};
  for (auto i : std::span{v} | std::views::take(1))
std::cout << i << '\n';
}

GCC-trunk reports the following warning when the -Wall flag is used:

:8:51: warning: possibly dangling reference to a temporary
[-Wdangling-reference]
8 |   for ( auto i : std::span{v} | std::views::take(1))
  |   ^


https://godbolt.org/z/5jhnTTej9

[Bug tree-optimization/111402] Loop distribution fail to optimize memmove for multiple consecutive moves within a loop

2023-09-13 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111402

--- Comment #2 from Hongtao.liu  ---
Adjust code in foo1, use < n instead of != n, the issue remains.

void
foo1 (v4di* __restrict a, v4di *b, int n)
{
  for (int i = 0; i < n; i+=2)
{
a[i] = b[i];
a[i+1] = b[i+1];   
}
}

[Bug target/111334] [14 regression] ICE is reported during the combine pass optimization

2023-09-13 Thread chenglulu at loongson dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111334

chenglulu  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #21 from chenglulu  ---
fixed

[Bug target/111411] New: [14 regression] ICE when building opus-1.4 (celt_decoder.c:1182:1: internal compiler error: in extract_insn, at recog.cc:2791)

2023-09-13 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111411

Bug ID: 111411
   Summary: [14 regression] ICE when building opus-1.4
(celt_decoder.c:1182:1: internal compiler error: in
extract_insn, at recog.cc:2791)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sjames at gcc dot gnu.org
  Target Milestone: ---

Created attachment 55896
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55896&action=edit
celt_decoder.c.i

```
gcc (Gentoo 14.0.0 p, commit d0b55776a4e1d2f293db5ba0e4a04aefed055ec4) 14.0.0
20230913 (experimental) 3af2af15798cb6243e2643f98f62c9270b1ca5d2
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

```
FAILED: celt/libopus-celt.a.p/celt_decoder.c.o
aarch64-unknown-linux-gnu-gcc -Icelt/libopus-celt.a.p -Icelt -I../opus-1.4/celt
-I. -I../opus-1.4 -Iinclude -I../opus-1.4/include -Isilk -I../opus-1.4/silk
-fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -W
all -Winvalid-pch -Wextra -std=gnu99 -DOPUS_BUILD -DHAVE_CONFIG_H
-fvisibility=hidden -Wcast-align -Wnested-externs -Wshadow -Wstrict-prototypes
-fstack-protector-strong -O3 -pipe -mcpu=native -fdiagnostics-c
olor=always -ggdb3 -fPIC -MD -MQ celt/libopus-celt.a.p/celt_decoder.c.o -MF
celt/libopus-celt.a.p/celt_decoder.c.o.d -o
celt/libopus-celt.a.p/celt_decoder.c.o -c ../opus-1.4/celt/celt_decoder.c
../opus-1.4/celt/celt_decoder.c: In function ‘celt_decode_with_ec’:
../opus-1.4/celt/celt_decoder.c:1182:1: error: unrecognizable insn:
 1182 | }
  | ^
(insn 5312 42 41 42 (parallel [
(set (mem/c:SI (plus:DI (reg/f:DI 29 x29)
(const_int -260 [0xfefc])) [18 %sfp+-260 S4
A32])
(const_int 0 [0]))
(set (mem/c:SI (plus:DI (reg/f:DI 29 x29)
(const_int -256 [0xff00])) [18 %sfp+-256 S4
A32])
(const_int 0 [0]))
]) "../opus-1.4/celt/celt_decoder.c":978:22 -1
 (nil))
during RTL pass: cprop_hardreg
../opus-1.4/celt/celt_decoder.c:1182:1: internal compiler error: in
extract_insn, at recog.cc:2791
0xe5b09473 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
   
/usr/src/debug/sys-devel/gcc-14.0.0./gcc-14.0.0./gcc/rtl-error.cc:108
0xe5b0951b _fatal_insn_not_found(rtx_def const*, char const*, int, char
const*)
   
/usr/src/debug/sys-devel/gcc-14.0.0./gcc-14.0.0./gcc/rtl-error.cc:116
0xe626d693 extract_insn(rtx_insn*)
   
/usr/src/debug/sys-devel/gcc-14.0.0./gcc-14.0.0./gcc/recog.cc:2791
0xe6273fb3 extract_constrain_insn(rtx_insn*)
   
/usr/src/debug/sys-devel/gcc-14.0.0./gcc-14.0.0./gcc/recog.cc:2690
0xe6278d5b copyprop_hardreg_forward_1
   
/usr/src/debug/sys-devel/gcc-14.0.0./gcc-14.0.0./gcc/regcprop.cc:836
0xe627a7e3 execute
   
/usr/src/debug/sys-devel/gcc-14.0.0./gcc-14.0.0./gcc/regcprop.cc:1423
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://bugs.gentoo.org/> for instructions.
```

gcc -c celt_decoder.c.i -O3 is enough to repro.

[Bug target/111411] [14 regression] ICE when building opus-1.4 (celt_decoder.c:1182:1: internal compiler error: in extract_insn, at recog.cc:2791)

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111411

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Keywords||ice-on-valid-code

[Bug target/111411] [14 regression] ICE when building opus-1.4, unrecognizable insn with -fstack-protector-strong

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111411

Andrew Pinski  changed:

   What|Removed |Added

Summary|[14 regression] ICE when|[14 regression] ICE when
   |building opus-1.4   |building opus-1.4,
   |(celt_decoder.c:1182:1: |unrecognizable insn with
   |internal compiler error: in |-fstack-protector-strong
   |extract_insn, at|
   |recog.cc:2791)  |

--- Comment #1 from Andrew Pinski  ---
I am 99% sure this was caused by the patch set that Richard S. did a few days
ago.

[Bug target/111411] [14 regression] ICE when building opus-1.4, unrecognizable insn with -fstack-protector-strong

2023-09-13 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111411

--- Comment #2 from Sam James  ---
Created attachment 55897
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55897&action=edit
reduced.i

I've attached a reduced version but the memcpy bit could do with cleaning up
for it to be a bit more sensible.

[Bug tree-optimization/106164] (a > b) & (a >= b) does not get optimized until reassoc1

2023-09-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106164

--- Comment #16 from Andrew Pinski  ---
Next patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630241.html

[Bug target/111372] libgcc: RISCV C++ exception handling stack usage grew in 13.1

2023-09-13 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111372

--- Comment #5 from Kito Cheng  ---
> Ok, but it's better to have configure option or something else just
> for toolchains that definitely do not use vector extension

I can understand that there would be such a demand in the embedded world, but
that's not critical issue, so this won't get high priority to most RISC-V GCC
developer, it would be appreciate if you could send a patch for that.

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

--- Comment #4 from rguenther at suse dot de  ---
On Wed, 13 Sep 2023, rdapp at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
> 
> Robin Dapp  changed:
> 
>What|Removed |Added
> 
>  CC||rdapp at gcc dot gnu.org
> 
> --- Comment #2 from Robin Dapp  ---
> I played around with this a bit.  Emitting a COND_LEN in if-convert is easy:
> 
> _ifc__35 = .COND_ADD (_23, init_20, _8, init_20);
> 
> However, during reduction handling we rely on the reduction being a gimple
> assign and binary operation, though so I needed to fix some places and indices
> as well as the proper mask.
> 
> What complicates things a bit is that we assume that "init_20" (i.e. the
> reduction def) occurs once when we have it twice in the COND_ADD.  I just
> special cased that for now.  Is this the proper thing to do?

I think so - we should ignore a use in the else value when the other
use is in that same stmt.

> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 23c6e8259e7..e99add3cf16 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
> *shared)
>  static bool
>  fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
>  {
> -  if (code == PLUS_EXPR)
> +  if (code == PLUS_EXPR || code == IFN_COND_ADD)
>  {
>*reduc_fn = IFN_FOLD_LEFT_PLUS;
>return true;
> @@ -4106,8 +4106,11 @@ vect_is_simple_reduction (loop_vec_info loop_info,
> stmt_vec_info phi_info,
>return NULL;
>  }
> 
> -  nphi_def_loop_uses++;
> -  phi_use_stmt = use_stmt;
> +  if (use_stmt != phi_use_stmt)
> +   {
> + nphi_def_loop_uses++;
> + phi_use_stmt = use_stmt;
> +   }
> 
> @@ -7440,6 +7457,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>if (i == STMT_VINFO_REDUC_IDX (stmt_info))
> continue;
> 
> +  if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
> +   continue;
> +
> 
> Apart from that I think what's mainly missing is making the added code nicer. 
> Going to attach a tentative patch later.
> 
>

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

--- Comment #5 from rguenther at suse dot de  ---
On Wed, 13 Sep 2023, rdapp at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
> 
> --- Comment #3 from Robin Dapp  ---
> Several other things came up, so I'm just going to post the latest status here
> without having revised or tested it.  Going to try fixing it and testing
> tomorrow.

I think what's important to do is make sure targets without
masking are still getting the cond-reduction code generation
(but with the signed-zero issue fixed).  Using a cond_add is
probably better than the vec_cond + add even for the not
fold-left reduction case.

[Bug debug/111409] Invalid .debug_macro.dwo macro information for split DWARF

2023-09-13 Thread osandov at osandov dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111409

--- Comment #1 from Omar Sandoval  ---
Patch sent:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630242.html