[Bug lto/99828] inlining failed in call to ‘always_inline’ ‘memcpy’: --param max-inline-insns-auto limit reached

2021-03-30 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99828

--- Comment #3 from Andi Kleen  ---
So what do you want to fix in the kernel? 

Use a wrapper for taking the address of the memcpy?
(I hope nothing in gcc would remove such a wrapper)

[Bug middle-end/99578] gcc-11 -Warray-bounds or -Wstringop-overread warning when accessing a pointer from integer literal

2021-05-01 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #12 from Andi Kleen  ---
It looks to me separate bugs are mixed together here.

For example I looked at the preallocate_pmd warning again and I don't think
there is any union there. Also I noticed that when I replace the *foo[N] with
**foo it disappears. So I think that is something different.

So there seem to be instances where such warnings happen without union members.
Perhaps that one (and perhaps some others) need to be reanalyzed.

I also looked at the intel_pm.c and I think that one is a real kernel bug,
where the field accessed is really too small. I'll submit a patch for that.

[Bug lto/107014] flatten+lto fails the kernel build

2022-09-25 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107014

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #9 from Andi Kleen  ---
I suspect what happens is that it hits in some kernel initialization function.
If they don't use initcall the LTO build can all inline them into each other
(because they are only called once) creating a single big initialization
function. With flatten that will create an extremely large function that takes
a long time to process.

I suspect any use of flatten is better using always_inline, since that affects
only a single function. Should probably be fixed upstream in the kernel.

[Bug preprocessor/45227] libcpp Makefile does not enable instrumentation

2022-01-04 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45227

--- Comment #5 from Andi Kleen  ---
I think it was the method from the info file.

But I can't quite remember. If you cannot reproduce it I guess it's ok to
close. Maybe I made some mistake.

[Bug lto/107779] New: Support implicit references from inline assembler to compiler symbols

2022-11-20 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779

Bug ID: 107779
   Summary: Support implicit references from inline assembler to
compiler symbols
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org,
mliska at suse dot cz
  Target Milestone: ---

Created attachment 53933
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53933&action=edit
prototype patch

So I looked into the problem the kernel people complained about: a
lot of assembler statements reference C symbols, which need externally_visible
and
global for gcc LTO, otherwise they can end up in the wrong asm file
and cause missing symbols.

I came up with the attached (hackish) patch that tries to solve the problem
very
partially: it parses the assembler strings and looks for anything that
could be an identifier, and then tries to mark it externally_visible.

It has the following open issues:

- The parsing is very approximate and doesn't handle some obscure cases.
With the approximation it's also impossible to give error messages,
but hopefully the linker takes care of that.
It also gives false positives with some assembler syntax,
but in the worst case would just lose some optimization from unnecessary
references.

- It doesn't handle the case (which happens in the kernel) that the C
declaration is after the asm statement. This could be fixed with some
more effort.

- It doesn't work for static which can get mangled (that's a lot of
the kernel cases)
static is a difficult problem because there could be conflicting names,
so we cannot jut put it all in partition zero.

This would need some special handling in the LTO partitioning code to
create new partitions just for having unique name spaces, and then
avoid mangling.  Related problem is also PR50676

It's likely possible to create situations where it's impossible to
solve, there could be circular dependencies etc. But I assume in this
case the non LTO case would fail too.

Or maybe do something with redefining symbols at the assembler level.

This one is somewhat difficult and I don't have a simple solution
currently. Unfortunately to solve the kernel issue would need a
solution for static.

[Bug middle-end/111743] New: shifts in bit field accesses don't combine with other shifts

2023-10-09 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111743

Bug ID: 111743
   Summary: shifts in bit field accesses don't combine with other
shifts
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

(not sure it's the middle-end, picked arbitrarily)

The following code

struct bf { 
unsigned a : 10, b : 20, c : 10;
};
unsigned fbc(struct bf bf) { return bf.b | (bf.c << 20); }


generates:

movq%rdi, %rax
shrq$10, %rdi
shrq$32, %rax   
andl$1048575, %edi
andl$1023, %eax
sall$20, %eax
orl %edi, %eax
ret

It doesn't understand that the shift right can be combined with the shift left.
Also not sure why the shift left is arithmetic (this should be all unsigned) 

clang does the simplification which ends up one instruction shorter:
movl%edi, %eax
shrl$10, %eax
andl$1048575, %eax  # imm = 0xF
shrq$12, %rdi
andl$1072693248, %edi   # imm = 0x3FF0
orl %edi, %eax
retq

[Bug middle-end/111743] shifts in bit field accesses don't combine with other shifts

2023-10-09 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111743

--- Comment #2 from Andi Kleen  ---
Okay then it doesn't understand that SHL_signed and SHR_unsigned can be
combined when one the values came from a shorter unsigned.

[Bug middle-end/111743] shifts in bit field accesses don't combine with other shifts

2023-10-09 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111743

--- Comment #5 from Andi Kleen  ---

config/i386/i386.h:#define SLOW_BYTE_ACCESS 0

You mean it doesn't define it?

[Bug testsuite/116163] RFE: add a linting tool for DegaGnu tests

2024-08-01 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116163

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #7 from Andi Kleen  ---
It seems instead of a linting script it would be better if dejagnu errored out
for the bad cases. Then everything would be caught always.

Right now it seems to ignore unknown/unparseable commands inside {} ?

[Bug tree-optimization/116166] risc-v (last) insn-emit-nn.c build takes hours

2024-08-01 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #5 from Andi Kleen  ---
Have you tried a LTO build? It can split large files.

It's not incremental however (unless the recent patches for that go in)

[Bug tree-optimization/115866] missed optimization vectorizing switch statements.

2024-08-01 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #3 from Andi Kleen  ---
Created attachment 58804
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58804&action=edit
handle switch in tree if-conversion

Here's a patch that passes simple test cases on x86_64. It adds recognition of
simple switches (like the ones generated by if to switch) to tree-ifcvt. Still
running a full test.

[Bug ipa/116191] Avoid inlining in unlikely branches

2024-08-02 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116191

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #2 from Andi Kleen  ---
I suppose it depends on the programing style if it's a good idea. Sometimes
inlining allows to constant propagate and collapse a lot of code, and you
definitely want that for cold code too.

[Bug tree-optimization/116166] risc-v (last) insn-emit-nn.c build takes hours

2024-08-05 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166

--- Comment #13 from Andi Kleen  ---
Created attachment 58842
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58842&action=edit
add a param to limit BBs for dominator pass

Maybe something like this patch. It adds a check to disable the dom passes when
the number of BBs per function exceeds a threshold. By default it is disabled,
but you can set it with --param dom-bb-limit=1 or similar

[Bug tree-optimization/115866] missed optimization vectorizing switch statements.

2024-08-06 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

Andi Kleen  changed:

   What|Removed |Added

  Attachment #58804|0   |1
is obsolete||

--- Comment #4 from Andi Kleen  ---
Created attachment 58850
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58850&action=edit
Switch if conversion v2

This version has test cases and passes full testing.

[Bug target/116264] New: Spurious {NF}s in APX code

2024-08-06 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116264

Bug ID: 116264
   Summary: Spurious {NF}s in APX code
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---
Target: x86_64-linux

unsigned fclear(unsigned a, unsigned b)
{
if (a & (1 << 10))
b &= ~(1 << 20);
return b;
}

gives

cc1 -O2  tbitifconv.c -march=skylake  -mapxf -quiet

fclear:
.LFB1:
.cfi_startproc
{nf} andl   $-1048577, %esi, %eax
andl$1024, %edi
cmove   %esi, %eax
ret
.cfi_endproc


The {nf} seems to be useless because this is the first instruction and there is
no live condition code. Of course it's just a bit set so doesn't cost anything
but it may point to more general problems in how {nf} is placed.

[Bug gcov-profile/71672] inlining indirect calls does not work with autofdo

2024-08-12 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71672

Andi Kleen  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Andi Kleen  ---
Patch checked in for some time

[Bug c++/116285] Compilation of nodejs/v8's v8_base_without_compiler.runtime-temporal.cc is slow

2024-08-13 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116285

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #2 from Andi Kleen  ---
push_to_top_level is about 5% and seems to do a lot of list walking of
different scopes. Maybe a better data structure like a vector for the scopes
would help.

On my skylake it appears to be primarily Frontend Bound due to large code, so
you might get a slight improvement by using a profile feedback built host
compiler that does hot cold code splitting.

3+% is GC so you could get some boost by increasing the GC limits to GC less
often.   Try playing with --param ggc-min-expand and --param ggc-min-heapsize

0.94% of the cycles are iterative_hash, so you might get another slight
improvement from  https://github.com/andikleen/gcc/commits/rapidhash-1
which switches the hash function to something more modern
(still looking for supporting data that it actually helps)

But none of this will drastically cut the time, the profile is fairly flat.

# Overhead  Command  Source Shared Object  Source Symbol   
   
  >
#   ...   
...>
#
 5.11%  cc1plus  cc1plus   [.] push_to_top_level() 
   
  > 2.71%  cc1plus  cc1plus
  [.] gt_ggc_mx_lang_tree_node(void*)  
 > 
   1.00%  cc1plus  cc1plus   [.] ggc_set_mark(void const*) 
   
> 0.94%  cc1plus  cc1plus  
[.] iterative_hash 
   >   
 0.73%  cc1plus  cc1plus   [.] fields_linear_search(tree_node*,
tree_node*, bool) [clone .isra.0]  
  > 0.72%  cc1plus  cc1plus
  [.] iterative_hash_template_arg(tree_node*, unsigned int)
 > 
   0.67%  cc1plus  cc1plus   [.] ggc_internal_alloc(unsigned long,
void (*)(void*), unsigned long, unsigned long) 
 > 0.64%  cc1plus  cc1plus 
 [.] gt_ggc_mx_lang_tree_node(void*)   
>  
  0.54%  cc1plus  cc1plus   [.] ggc_set_mark(void const*)  
   
   > 0.54%  cc1plus  cc1plus  
[.] fields_linear_search(tree_node*, tree_node*, bool) [clone .isra.0] 
   >   
 0.51%  cc1plus  cc1plus   [.] fields_linear_search(tree_node*,
tree_node*, bool) [clone .isra.0]  
  >

[Bug target/116497] New: Need no_caller_saved_registers with SSE support

2024-08-26 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

Bug ID: 116497
   Summary: Need no_caller_saved_registers with SSE support
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: hjl.tools at gmail dot com
  Target Milestone: ---
Target: x86_64-linux

When writing threaded code interpreters by chaining functions with musttail the
normal ABI behavior of some caller saved registers can cause unnecessary spills
and fills compared to using indirect goto.

In principle this could be avoided by using no_caller_saved_registers on the
musttail called function, and perhaps no_callee_saved_registers on the function
that starts the interpretation chain to maintain the ABI on the interpreter
entry point.

But these attributes were designed for interrupt handlers and require disabling
SSE because an interrupt handler needs to really preserve all registers. While
for the interpreter case which interacts with the normal ABI it is fine to
clobber the SSE registers, as specified by the x86_64 SYSV ABI

So disabling SSE can be done (and it is done in some real code today, see [1]
below) it is very inconvenient for an interpreter that may want to use SSE for
floating point etc.

So what we really need for the efficient musttail interpreters is
no_caller_saved_registers, but allow using SSE.

clang has a special ABI for this case (preserve_most[2]), but that seems
overkill.

There are two ways around this:

- We just remove the code that enforces no SSE for (see below patch). The
interrupt handlers would need to disable SSE without error and trust that 
it doesn't happen by mistake.

I'm not sure there is much code that uses this (I couldn't find any).
Presumably they would rather use the interrupt attribute anyways.

- We define a new attribute like no_caller_saved_registers except that it
allows using SSE.

[1] 
https://github.com/swoole/swoole-cli/blob/94ab97fbcfe39be8f5a985da82575bfb4c2319db/Zend/zend_string.c#L376

[2]
https://clang.llvm.org/docs/AttributeReference.html#preserve-most

[Bug target/116497] Need no_caller_saved_registers with SSE support

2024-08-26 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #1 from Andi Kleen  ---
Disable check for no_caller_saved_registers enforcing non FP.

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index f79257cc764..cec652cc9e6 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3639,8 +3639,8 @@ ix86_set_current_function (tree fndecl)
 reinit_regs ();

   if (cfun->machine->func_type != TYPE_NORMAL
-  || (cfun->machine->call_saved_registers
- == TYPE_NO_CALLER_SAVED_REGISTERS))
+  /* || (cfun->machine->call_saved_registers
+== TYPE_NO_CALLER_SAVED_REGISTERS) */)
 {
   /* Don't allow SSE, MMX nor x87 instructions since they
 may change processor state.  */

[Bug target/116497] Need no_caller_saved_registers with SSE support

2024-08-27 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #16 from Andi Kleen  ---
Created attachment 59013
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59013&action=edit
test case

This test case using Pinski's clobber trick shows the benefit.

If you compile with -O2 -mgeneral-regs-only the inc/dec opcodes don't save any
extra registers and generate nearly optimal code. If you make the
SAVE_REGS/DONT_SAVE_REGS macros empty they have a lot of extra push/pop, which
would ruin the interpreter loop.

-mgeneral-regs-only works for this case, but breaks SSE.

[Bug target/116497] Need no_caller_saved_registers with SSE support

2024-08-27 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #21 from Andi Kleen  ---
As HJ pointed out the change is not needed, the compiler DTRT with
no_callee_saved_registers on the callees.

[Bug target/116497] Need no_caller_saved_registers with SSE support

2024-08-27 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

Andi Kleen  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|WAITING |RESOLVED

[Bug tree-optimization/116500] gcc.dg/vect/vect-switch-ifcvt-1.c FAILs

2024-08-27 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116500

--- Comment #2 from Andi Kleen  ---
Do you have the dump file from tree-vect?

I guess it just doesn't vectorize something here.

The right fix is probably to skip it for sparc, or adjust the vect_int target
test.

[Bug tree-optimization/116500] gcc.dg/vect/vect-switch-ifcvt-1.c FAILs

2024-08-27 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116500

--- Comment #4 from Andi Kleen  ---

It seems sparc doesn't support comparisons in vectorization? 

/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c:13:7:
missed:   not vectorized: relevant stmt not supported: _13 = _1 == 124;

the target check for vect_int is already just hard coded targets, so I don't
hink we can do much better than a architecture skip. otherwise have to put
probes for everything into the target tests.

proc check_effective_target_vect_int { } {
return [check_cached_effective_target_indexed vect_int {
  expr {
 [istarget i?86-*-*] || [istarget x86_64-*-*]
 || [istarget powerpc*-*-*]
 || [istarget amdgcn-*-*]
 || [istarget sparc*-*-*]
 || [istarget alpha*-*-*]
 || [istarget ia64-*-*]
 || [istarget aarch64*-*-*]
 || [is-effective-target arm_neon]
 || ([istarget mips*-*-*]
 && ([et-is-effective-target mips_loongson_mmi]
 || [et-is-effective-target mips_msa]))
 || ([istarget s390*-*-*]
 && [check_effective_target_s390_vx])
 || ([istarget riscv*-*-*]
 && [check_effective_target_riscv_v])
 || ([istarget loongarch*-*-*]
 && [check_effective_target_loongarch_sx])
}}]
}


Proposed patch:

diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
index f5352ef8ed7a..6d2a5ce52f20 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-skip-if "no support for vector comparison in optab" { sparc*-*-* } }
*/
 #include "tree-vect.h"

 extern void abort (void);

[Bug tree-optimization/116520] New: incorrect

2024-08-28 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520

Bug ID: 116520
   Summary: incorrect
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

[Bug tree-optimization/116520] Multiple condition lead to missing vectorization due to missing early break

2024-08-28 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520

Andi Kleen  changed:

   What|Removed |Added

Summary|incorrect   |Multiple condition lead to
   ||missing vectorization due
   ||to missing early break

--- Comment #1 from Andi Kleen  ---
const unsigned char *search_line_fast2 (const unsigned char *s,
const unsigned char *end)
{
  while (s < end) {
if (*s == '\n' || *s == '\r' || *s == '\\' || *s == '?')
  break;
s++;
  }
  return s;
}

compiled with -march=skylake-avx512 -fopt-info-all gives 

../gcc/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c:9:12: missed:
couldn't vectorize loop
../gcc/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c:9:12: missed:
not vectorized: unsupported control flow in loop.


due to this:

 /* Check if we have any control flow that doesn't leave the loop.  */
  class loop *v_loop = loop->inner ? loop->inner : loop;
  basic_block *bbs = get_loop_body (v_loop);
  for (unsigned i = 0; i < v_loop->num_nodes; i++)
if (EDGE_COUNT (bbs[i]->succs) != 1
&& (EDGE_COUNT (bbs[i]->succs) != 2
|| !loop_exits_from_bb_p (bbs[i]->loop_father, bbs[i])))
  {
free (bbs);
return opt_result::failure_at (vect_location,
   "not vectorized:"
   " unsupported control flow in loop.\n");
  }


But the control flow clearly leaves the loop.

At the gimple level it gives code like


  [local count: 110211694]:

   [local count: 1044213920]:
  # s_15 = PHI 
  _1 = *s_15;
  if (_1 > 63)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 522106960]:
  goto ; [100.00%]

   [local count: 522106960]:
  _14 = (int) _1;
  _17 = 9223372036854785024 >> _14;
  _18 = _17 & 1;
  _19 = _18 == 0;
  _12 = ~_19;

   [local count: 1044213920]:
  # prephitmp_4 = PHI <_12(4), 0(11)>
  _10 = _1 == 92;
  _13 = prephitmp_4 | _10;
  if (_13 != 0)
goto ; [8.03%]
  else
goto ; [91.97%]

  [local count: 83800317]:
  # s_5 = PHI 
  goto ; [100.00%]

   [local count: 960413605]:
  s_9 = s_15 + 1;
  if (end_7(D) > s_9)
goto ; [97.25%]
  else
goto ; [2.75%]

   [local count: 26411378]:
  # s_20 = PHI 
  goto ; [100.00%]

   [local count: 934002227]:
  goto ; [100.00%]



which seems to come from if to switch and then switch expansion?

[Bug c/116545] New: Support old style statement attributes

2024-08-30 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116545

Bug ID: 116545
   Summary: Support old style statement attributes
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Forked from PR83324. Applies to C/C++

It seems clang supports old style __attribute__ label attributes for musttail
(and presumably others) while gcc only supports the standard [[ ]] attributes.

With musttail working existing code that does

extern void foo(void);
void func(void)
{
#if __has_attribute(musttail)
__attribute__((musttail))
#endif
return foo();
}

builds on clang, but not on gcc. Bradley Lucier's scheme compiler does this
(see https://gcc.gnu.org/pipermail/gcc-help/2024-August/143676.html)

Short term we could make __has_attribute fails for just "musttail" (shouldn't
it be gnu::musttail or clang::musttail anyways?)

Or we support the old style attributes too which might be more compatible.

[Bug target/116599] New: volatile generates unexpected RMW on global

2024-09-04 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116599

Bug ID: 116599
   Summary: volatile generates unexpected RMW on global
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

On x86_64-linux:

volatile int a;

void f1(void)
{
a++;
}

int b;

void f2(void)
{
b++;
}

generates

f1:
movla(%rip), %eax
addl$1, %eax
movl%eax, a(%rip)
ret

f2:
addl$1, b(%rip)
ret


I would expect f1 to have the same code as f2.

[Bug tree-optimization/115866] missed optimization vectorizing switch statements.

2024-09-10 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866

--- Comment #8 from Andi Kleen  ---
It doesn't even try to convert the switch because of

t.c.179.ifcvt:
Can not ifcvt due to multiple exits


  if (loop->num_nodes > 2)
{
  /* More than one loop exit is too much to handle.  */
  if (!single_exit (loop))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Can not ifcvt due to multiple exits\n");
}
  else

So an early exit problem.

You can see the same problem even without need for switch like

short a[100];

int foo(int n, int counter)
{
   for (int i = 0; i < n; i++)
 {
if (a[i] == 1 || a[i] == 2)
return 1;
 }
return 0;
}

I don't think that should be handled on this bug.

[Bug tree-optimization/116520] Multiple condition lead to missing vectorization due to missing early break

2024-09-12 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520

--- Comment #7 from Andi Kleen  ---
Tamas also gave this example in PR115866 which shows the same problem:

short a[100];

int foo(int n, int counter)
{
   for (int i = 0; i < n; i++)
 {
if (a[i] == 1 || a[i] == 2 || a[i] == 7 || a[i] == 4)
  return 1;
 }
return 0;
}

[Bug c++/115091] New: Support value speculation in frontend

2024-05-14 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115091

Bug ID: 115091
   Summary: Support value speculation in frontend
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

This blog post describes an interesting optimization technique for memory
access. https://mazzo.li/posts/value-speculation.html

A linked list walk is often be limited by the latency of the L1 cache. When the
program can guess the next address (e.g. because the nodes are often allocated
sequentially in memory) it is possible to use construct like

if (node->next == node + 1)
node++;
else
node = node->next;

and rely on the CPU speculating the fast case.

However this often runs into problems with the compiler, e.g. for

next = node->next;
node++;
if (node != next)
  node = next;

is often optimized away. While this can be worked around with some code
restructuring, this may not always work for more complex cases. I wonder if it
makes sense to formally support this technique with a "nocse" or similar
variable attribute that is honored by optimization passes.

[Bug target/115255] New: sibcall at -O0 causes ICE in df_refs_verify on arm

2024-05-28 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115255

Bug ID: 115255
   Summary: sibcall at -O0 causes ICE in df_refs_verify on arm
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Note to trigger this bug need to modify tree-tailcall to run at -O0 (which is
done for musttail), e.g. by the patches here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/thread.html#652312

On ARM targets with -O0 I see

int __attribute__((noinline,noclone,noipa))callee (int i);

int __attribute__((noinline,noclone,noipa))
caller (int i)
{
  [[gnu::musttail]] return callee (i + 1);
}

../musttail1.c: In function ‘caller’:
../musttail1.c:14:1: internal compiler error: in df_refs_verify, at
df-scan.cc:4022
   14 | }
  | ^
0x87d483 df_refs_verify
../../gcc/gcc/df-scan.cc:4022
0xaa7d56 df_insn_refs_verify
../../gcc/gcc/df-scan.cc:4101
0xaa996b df_bb_verify
../../gcc/gcc/df-scan.cc:4138
0xaa9ce7 df_scan_verify()
../../gcc/gcc/df-scan.cc:4259
0xa979c2 df_verify()
../../gcc/gcc/df-core.cc:1834
0xa97a2a df_analyze_1
../../gcc/gcc/df-core.cc:1221
0xa97c31 df_analyze()
../../gcc/gcc/df-core.cc:1305
0x12607f0 thumb2_reorg
../../gcc/gcc/config/arm/arm.cc:19392
0x127522b arm_reorg
../../gcc/gcc/config/arm/arm.cc:19611
0xea6eab execute
../../gcc/gcc/reorg.cc:3931


arm_reorg is just before sched2.

The assert happens because the REG/REGNO of the new df chain changes when it is
rebuilt (differs by one from the old). Presumably changed by ipa or reload
since there are very few passes at -O0. I'm not fully sure which data structure
is stale.

I tried to add the df_analyze in the same position on x86, but the problem is
not triggered there.

[Bug target/115255] sibcall at -O0 causes ICE in df_refs_verify on arm

2024-06-01 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115255

--- Comment #4 from Andi Kleen  ---
Created attachment 58323
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58323&action=edit
hack patch to fix arm sibcalls at -O0

The attached patch makes the test case pass on arm. 

- Some of the sibcall related code in "pro_and_epilogue" was dependent on
optimize
Should probably add some cfun flag that indicates musttail? I enabled it
unconditionally for now.
- The thumb2_reorg code corrupts the DF chains for sibcalls when not
optimizing. I just disabled it for now, it shouldn't be needed at -O0 since
it's just a size optimization.

This may not be clean enough to post however.

I'm a bit afraid that other ports may break too, will do some testing.

[Bug target/115255] sibcall at -O0 causes ICE in df_refs_verify on arm

2024-06-01 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115255

--- Comment #6 from Andi Kleen  ---
Created attachment 58324
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58324&action=edit
patch to fix arm sibcalls with -O0

Better patch that uses the existing cfun flag for tail calls.

[Bug target/115255] sibcall at -O0 causes ICE in df_refs_verify on arm

2024-06-01 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115255

--- Comment #7 from Andi Kleen  ---
The patch can be even more minimized. The thumb2_reorg change is not needed
because nothing does df_verify() after it (I just noticed it because I added
some extra for debugging). So even though thumb2_reorg breaks the DF with -O0
it doesn't matter because nobody cares.

[Bug lto/107779] Support implicit references from inline assembler to compiler symbols

2023-10-15 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779

--- Comment #4 from Andi Kleen  ---
This whole manual annotation idea (which is equivalent to marking the symbols
global and visible and that is what a large part of the kernel LTO patchkit) is
dead on arrival because the kernel people already rejected it. Their argument
is that they don't need it for LLVM why should they be forced to it for GCC. In
LLVM it is just done by the assembler, and it works without any extra program
changes.

Since gcc is not the only game in town anymore they have a point.

It's either heuristics or integrating the assembler.

[Bug gcov-profile/113765] New: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized

2024-02-05 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765

Bug ID: 113765
   Summary: autofdo: val-profiler-threads-1.c compilation,  error:
probability of edge from entry block not initialized
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

With recent trunk (019dc63819be)

When running the test suite on a Intel system with autofdo installed

Executing on host: /home/ak/gcc/obj-full/gcc/xgcc -B/home/ak/gcc/obj-full/gcc/ 
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c   
-fdi
agnostics-plain-output   -O0 -pthread -fprofile-update=atomic
-fauto-profile=/home/ak/gcc/obj-full/gcc/testsuite/gcc20/afdo.val-profiler-threads-1.gcda
-DFOR_AU
TOFDO_TESTING -fearly-inlining -dumpbase-ext .x02  -lm  -o
/home/ak/gcc/obj-full/gcc/testsuite/gcc20/val-profiler-threads-1.x02   
(timeout = 300)
spawn -ignore SIGHUP /home/ak/gcc/obj-full/gcc/xgcc
-B/home/ak/gcc/obj-full/gcc/
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c -fdiag
nostics-plain-output -O0 -pthread -fprofile-update=atomic
-fauto-profile=/home/ak/gcc/obj-full/gcc/testsuite/gcc20/afdo.val-profiler-threads-1.gcda
-DFOR_AUTOFD
O_TESTING -fearly-inlining -dumpbase-ext .x02 -lm -o
/home/ak/gcc/obj-full/gcc/testsuite/gcc20/val-profiler-threads-1.x02
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c: In
function 'copy_memory':
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7:
error: probability of edge from entry block not initialized
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7:
error: probability of edge 2->4 not initialized
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7:
error: probability of edge 5->1 not initialized
during GIMPLE pass: fixup_cfg
/home/ak/gcc/gcc/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c:13:7:
internal compiler error: verify_flow_info failed
0xafb91e verify_flow_info()
../../gcc/gcc/cfghooks.cc:287
0xf0e8a7 execute_function_todo
../../gcc/gcc/passes.cc:2100
0xf0edde execute_todo
../../gcc/gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
compiler exited with status 1

I'm not attaching the source because it also needs the autofdo gcov file to
reproduce and the test case is already in tree.

[Bug gcov-profile/113765] autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized

2024-02-05 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765

--- Comment #1 from Andi Kleen  ---
Seems to be a regression, I tested the same setup on gcc 13 and the test passes
there:

55:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation, 
-fprofile-generate -D_PROFILE_GENERATE
59:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,   
-fprofile-generate -D_PROFILE_GENERATE
62:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation,  -fprofile-use
-D_PROFILE_USE
66:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,-fprofile-use
-D_PROFILE_USE
76:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation,  -g
-DFOR_AUTOFDO_TESTING
108:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,-g
-DFOR_AUTOFDO_TESTING
111:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c compilation, 
-fauto-profile -DFOR_AUTOFDO_TESTING -fearly-inlining
115:PASS: gcc.dg/tree-prof/val-profiler-threads-1.c execution,   
-fauto-profile -DFOR_AUTOFDO_TESTING -fearly-inlining

[Bug gcov-profile/113765] ICE: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized

2024-02-05 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765

--- Comment #3 from Andi Kleen  ---
-O1 fixes it, so an easy patch would be 

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 63d0c3dc36df..180ed7a8260f 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1758,7 +1758,7 @@ public:
   bool
   gate (function *) final override
   {
-return flag_auto_profile;
+return flag_auto_profile && optimize > 0;
   }
   unsigned int
   execute (function *) final override

[Bug lto/80379] Redundant note: code may be misoptimized unless -fno-strict-aliasing is used

2024-06-11 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80379

--- Comment #5 from Andi Kleen  ---
This bug is about printing a unnecessary message. If your code is actually
miscompiled even with -fno-strict-aliasing set (so it is ignored somewhere) it
is something different and you would need a test case to debug.

[Bug rtl-optimization/30688] Branch registers loaded too late on ia64

2024-06-11 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30688

Andi Kleen  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #7 from Andi Kleen  ---
ia64 is obsolete

[Bug middle-end/63556] gcc should dedup string postfixes

2024-06-11 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63556

Andi Kleen  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|NEW |RESOLVED

--- Comment #3 from Andi Kleen  ---
Since the linker does it the change is not needed.

[Bug target/80742] attribute target no- does not work

2024-06-11 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80742

Andi Kleen  changed:

   What|Removed |Added

 Resolution|--- |WORKSFORME
 Status|WAITING |RESOLVED

[Bug c/82013] better error message for missing semicolon in prototype

2024-06-11 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82013

Andi Kleen  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|ASSIGNED|RESOLVED

--- Comment #8 from Andi Kleen  ---
Duplicate

*** This bug has been marked as a duplicate of bug 68615 ***

[Bug c++/68615] Unhelpful location when missing a semi-colon on a function declaration at the end of a header

2024-06-11 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68615

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #9 from Andi Kleen  ---
*** Bug 82013 has been marked as a duplicate of this bug. ***

[Bug middle-end/63556] gcc should dedup string postfixes

2024-06-13 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63556

--- Comment #7 from Andi Kleen  ---
I'm not sure how it would speed up the linker if gcc did it. The linker would
still need to do it because there might be matches between different .o files.
Also linker wouldn't know if the compiler supported this or not.

[Bug rtl-optimization/113723] switch (jump) tables don't get along with -freorder-blocks-and-partition on non-x86

2024-06-13 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113723

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #3 from Andi Kleen  ---
Even x86 has similar problems in some cases, see PR50676

[Bug tree-optimization/115484] New: AVX vectorization is limited to 3 comparisons

2024-06-13 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115484

Bug ID: 115484
   Summary: AVX vectorization is limited to 3 comparisons
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

With current trunk, but also older gcc

int f(char *s)
{
int c = 0;
int i;
for (i = 0; i < 64; i++) {
c |=  (*s == ',' || *s == '|' || *s == '!' /* || *s == '*' */);
s++;
}
return c;
}

vectorizes with -O3 -mavx2  -fopt-info-optall-all tvcmp.c

tvcmp.c:6:16: optimized: loop vectorized using 16 byte vectors
tvcmp.c:2:5: note: vectorized 1 loops in function.
tvcmp.c:7:10: optimized: loop with 3 iterations completely unrolled (header
execution count 16535624)


but when the fourth comparison is commented in it doesn't:

BB 3 is always executed in loop 1
loop 1's coldest_outermost_loop is 1, hotter_than_inner_loop is NULL
tvcmp.c:6:16: missed: couldn't vectorize loop
tvcmp.c:6:16: missed: not vectorized: unsupported control flow in loop.
tvcmp.c:2:5: note: vectorized 0 loops in function.
tvcmp.c:10:9: note: * Analysis failed with vector mode V16QI
tvcmp.c:10:9: note: * Skipping vector mode V16QI, which would repeat the
analysis for V16QI

[Bug c/115496] RFE: new warning to detect suspicious multline string literals

2024-06-14 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115496

--- Comment #2 from Andi Kleen  ---
It would need some heuristic that if the line nearly fills a standard line
length (how defined) don't trigger it. Otherwise people breaking the string due
to line length restrictions might trigger it incorrectly.

[Bug c/115496] RFE: new warning to detect suspicious multiline string literals

2024-06-14 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115496

--- Comment #3 from Andi Kleen  ---
When writing inline assembler an alternative to \n is to use ; as separator

e.g.

asm("movl $1,%eax ; "
"movl %eax,%ebx")

there can be also comment mistake here like


asm("movl $1,%eax # comment ;"
"movl %eax,%ebx"); 

This incorrectly drops the second instruction. The \n warning wouldn't handle
that case, it would need knowledge about # comments.

I've hit that problem in some real code. Always preferred to write ; over \n
because it looks less ugly.

[Bug c/115496] RFE: new warning to detect suspicious multiline string literals

2024-06-14 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115496

--- Comment #6 from Andi Kleen  ---
Yes a # check would need to be target dependent.

[Bug c/83324] [feature request] Pragma or special syntax for guaranteed tail calls

2024-06-19 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83324

--- Comment #14 from Andi Kleen  ---
Latest patchkit is here, but stalled due to lack of reviewers:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653319.html

[Bug bootstrap/115584] New: Boot strap comparison failure on trunk with --enable-checking=release

2024-06-21 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115584

Bug ID: 115584
   Summary: Boot strap comparison failure on trunk with
--enable-checking=release
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

On current trunk I get a bootstrap comparison failure between stage 2 and 3
with --enable-language=release on x86_64-linux. --enable-release=checking is
fine.

I've seen it with 

5320bcbd342a (origin/trunk, origin/master, origin/HEAD, trunk) xstormy16: Fix
xs_hi_nonmemory_operand
and
a84fe222029f (HEAD) testsuite: check that generated .sarif files validate
against the SARIF schema [PR109360]

(the later tested only without make clean, not sure if that affects bootstrap
errors)

[Bug bootstrap/115584] Boot strap comparison failure on trunk with --enable-checking=release

2024-06-21 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115584

Andi Kleen  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andi Kleen  ---
Hmm, I cannot reproduce on another machine. Maybe it's my system failing or
something. Closing for now.

[Bug tree-optimization/115484] [13/14/15 regression] if-to-switch prevents AVX vectorization

2024-06-21 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115484

--- Comment #6 from Andi Kleen  ---
As an interesting but irrelevant side comment clang seems to have the same bug.

[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues

2024-06-21 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099
Bug 85099 depends on bug 63384, which changed state.

Bug 63384 Summary: scheduler loops on endless fence list with 
-fselective-scheduling2 on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/63384] scheduler loops on endless fence list with -fselective-scheduling2 on x86

2024-06-21 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384

Andi Kleen  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Andi Kleen  ---
Long fixed.

[Bug middle-end/115606] New: return slot opt prevents tail calls

2024-06-23 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115606

Bug ID: 115606
   Summary: return slot opt prevents tail calls
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

For the following test case 

class Foo {
public:
  int a, b;
  Foo(int a, int b) : a(a), b(b) {}
};

Foo __attribute__((noinline,noclone,noipa))
callee (int i)
{
  return Foo(i, i+1);
}

Foo __attribute__((noinline,noclone,noipa))
caller (int i)
{
  return callee (i + 1);
}

x86 can sibling call the callee call, but ARM can't.

The difference seems to be that ARM ends up with 

   = callee (_1); [return slot optimization]

while x86_64 just has

  D.2926 = callee (_1);

and then in tree-tailcall this test fails with return slot optimization:

  tree result_decl = DECL_RESULT (cfun->decl);
  if (result_decl
  && may_be_aliased (result_decl)
  && ref_maybe_used_by_stmt_p (call, result_decl, false))

resulting in no sibling call.

This causes a test suite failure in the musttail patchkit, but only on ARM.

Of course the test could be disabled on ARM but I wonder if there is a missing
optimization bug here. With more musttail usage it might be a bigger problem if
not even simple structures are supported.

I'm not sure:
- why return slot opt at the gimple level is target specific?
- can tree tailcall handle this case?

[Bug tree-optimization/115344] Missing loop counter reversal

2024-06-23 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115344

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #2 from Andi Kleen  ---

If you go back long enough the RTL loop optimizer used to support that. It
broke at some point, even before that one was removed.

>so the doloop candidate isn't added?

x86 doesn't have hardware do loops, it's only for some obscure targets like
some DSPs.

[Bug middle-end/115606] return slot opt prevents tail calls

2024-06-23 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115606

Andi Kleen  changed:

   What|Removed |Added

 Target|arm-*-* |
 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #4 from Andi Kleen  ---
Yes I know it's from the frontend, the question is why that is target specific.

The only dependency I see is on PCC_STRUCT_RETURN, but neither x86 nor arm seem
to  set that define.

Anyways I guess it doesn't really matter why the frontend does that, but it
still breaks sibcalls.

[Bug tree-optimization/115344] Missing loop counter reversal

2024-06-24 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115344

--- Comment #4 from Andi Kleen  ---
Pedantry aside the basic problem is that doloop optimization depends on the
target supporting doloop, but the loop reversal would be useful everywhere.

So there are two options: add doloop to every target of interest or make the
reversal optimization independent.

[Bug tree-optimization/115344] Missing loop counter reversal

2024-06-24 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115344

--- Comment #5 from Andi Kleen  ---
Also the other problem is that doloop optimization is only for known bounds,
while generic reversal works for unknown too

[Bug preprocessor/79465] infinite #include cycle is not detected

2024-06-26 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79465

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #3 from Andi Kleen  ---
I don't think this is just a dup, the error thing was just a side commenting
and detecting include cycles would be an likely useful enhancement. 

However it's not clear how to detect them because there are valid cases when an
include can be included in a nested way, e.g. when some macro changes. It might
be challenging to distinguish valid and non valid cases.

[Bug tree-optimization/115274] Bogus -Wstringop-overread in SQLite source code

2024-06-28 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115274

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #7 from Andi Kleen  ---
Doesn't reproduce for me on recent trunk. So maybe already fixed.

The file is useful as a general run test case for the compiler though.

[Bug tree-optimization/115274] Bogus -Wstringop-overread in SQLite source code

2024-06-28 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115274

--- Comment #8 from Andi Kleen  ---
Ah never mind. I ran it with the wrong option with -O3 it shows the warning.
Unfortunately the run time is very long so it will be difficult to minimize.

[Bug tree-optimization/115274] Bogus -Wstringop-overread in SQLite source code

2024-06-29 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115274

--- Comment #10 from Andi Kleen  ---
-fno-thread-jumps fixes it, so it's probably a dup of PR109071 (same problem
with a different warning)

[Bug tree-optimization/115813] New: missing constant evaluation for vectors

2024-07-06 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115813

Bug ID: 115813
   Summary: missing constant evaluation for vectors
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

typedef int v4si __attribute__((vector_size(16)));

v4si v(v4si x)
{
x = (x <<  1) | 1;
x = (x <<  1) | 1;
return x;
}

gives

movdqa  .LC0(%rip), %xmm1
pslld   $1, %xmm0
por %xmm1, %xmm0
pslld   $1, %xmm0
por %xmm1, %xmm0
ret


but I would have expected the compiler to combine the two shifts and ors, like
it happens for scalar operations.

[Bug tree-optimization/115813] missing constant evaluation for vectors

2024-07-06 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115813

--- Comment #2 from Andi Kleen  ---
Is that the right pattern for the example? It looks different

Enabling match.pd debugging for the scalar version shows:

taddbit.c.034t.ccp1:Applying pattern match.pd:3960, gimple-match.cc:18437
taddbit.c.034t.ccp1:Applying pattern match.pd:3760, gimple-match.cc:14134
taddbit.c.035t.forwprop1:Applying pattern match.pd:3960, gimple-match.cc:18437
taddbit.c.035t.forwprop1:Applying pattern match.pd:3760, gimple-match.cc:14134
taddbit.c.035t.forwprop1:Applying pattern match.pd:1880, gimple-match.cc:34062

But anyways I suspect the general problem applies to a lot of patterns in
match.pd

[Bug tree-optimization/115979] New: Implicitly generated C++ calls stop musttail search early

2024-07-17 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115979

Bug ID: 115979
   Summary: Implicitly generated C++ calls stop musttail search
early
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

(this bug requires committing the remaining pieces of musttail) 

When running gcc/testsuite/g++.dg/musttail11.C with -O2 we see some "cannot
tail-call: other reasons" errors.

the problem is that find_tail_calls stops at the last call before the return.
But the C++ frontend generates destructor calls, or calls for operator int
calls after the user written tail call, which stops the search early. While
these calls cannot be tail called anyways due to the extra code they should get
a proper error message, like "code after calls". But when tree-tailcalls misses
a call only expand can print not very helpful "other reasons" error.

Possible fixes: 
- When a function has f->has_musttail set but no call is found search again,
but don't stop at the first call.
- Or maybe if f->has_musttail is set just keep searching
- Or add some way for the frontend to indicate a call is generated implicitly
and can be skipped.

[Bug target/115255] sibcall at -O0 causes ICE in df_refs_verify on arm

2024-07-18 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115255

Andi Kleen  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/115979] Implicitly generated C++ calls stop musttail search early

2024-07-18 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115979

--- Comment #3 from Andi Kleen  ---
Doing it in the frontend would require some duplication between C/C++ at least?

I was thinking to just keep searching if has_mustail is set, but was wary of
endless loops walking single basic block precessors. But I presume if they
could happen they could already happen.

[Bug c/83324] [feature request] Pragma or special syntax for guaranteed tail calls

2024-07-18 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83324

--- Comment #19 from Andi Kleen  ---
Middle/back-end parts are in, still need acks for the C/C++ frontend parts

[Bug c++/116019] New: Incorrect cannot-tail messages on targets

2024-07-20 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116019

Bug ID: 116019
   Summary: Incorrect cannot-tail messages on targets
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

(bug requires musttail patchkit comitted)

This is related to PR115606

On targets like ARM where the C++ frontend prevents tail calls returning
structures we also get incorrect tail call errors.

for example for runing cc1plus on testsuite/c-c++-common musttail12.c

get

../gcc/gcc/testsuite/c-c++-common/musttail12.c: In function ‘str cstruct(int)’:
../gcc/gcc/testsuite/c-c++-common/musttail12.c:13:38: error: cannot tail-call:
tail call must be same type

but they are on the same type and work for on x86.

  tree result_decl = DECL_RESULT (cfun->decl);
  if (result_decl
  && may_be_aliased (result_decl)
  && ref_maybe_used_by_stmt_p (call, result_decl, false))
{
  maybe_error_musttail (call, _("tail call must be same type"));
  return;
}

so this error message needs to be fixed to handle such a implicit case.

[Bug gcov-profile/83355] autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE

2024-07-22 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355

Andi Kleen  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Andi Kleen  ---
Long fixed in tree

[Bug preprocessor/116047] C preprocessor bug

2024-07-23 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116047

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #2 from Andi Kleen  ---
I suppose it should give a warning when this happens.

[Bug target/116014] Missed optimization opportunity: inverted shift count

2024-07-23 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116014

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #1 from Andi Kleen  ---
is that from some real code? why would a programmer write shifts like that?

[Bug other/116080] New tests from r15-2233-g8d1af8f904a0c0 fail

2024-07-24 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116080

--- Comment #1 from Andi Kleen  ---
Yes it is known that powerpc (or some flavors of it) has poor tail call support
due to ABI limitations.

Just need to figure out how to skip the test. I guess it needs a better test in
check_effective_target_tail_call



Maybe this patch will help. The drawback is that it will disable any tail call
testing on these targets however.

diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index d368251ef9a4..eaa9d1642194 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12739,7 +12739,8 @@ proc check_effective_target_frame_pointer_for_non_leaf
{ } {
 # most trivial type.
 proc check_effective_target_tail_call { } {
 return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
-   __attribute__((__noipa__)) void foo (void) { }
+   // C++
+   extern __attribute__((__noipa__)) void foo (void);
__attribute__((__noipa__)) void bar (void) { foo(); }
 } {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed
dump.
 }

[Bug other/116080] New tests from r15-2233-g8d1af8f904a0c0 fail

2024-07-24 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116080

--- Comment #2 from Andi Kleen  ---
Also can you upload the whole log files somewhere? I would like to see what the
output of check_effective_target_struct_tail_call is. It should have caught
some of these problems.

[Bug c/116087] New: Add optional warning for too large macro expansion

2024-07-25 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116087

Bug ID: 116087
   Summary: Add optional warning for too large macro expansion
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

The Linux kernel hit an interesting problem where a too complicated recursive
macro expansion caused significant compile time slow downs.

https://lore.kernel.org/lkml/1877ab0f14cf4f7d9da2a53e211cf...@acums.aculab.com/T/

To catch such things easier it would be useful to have a optional warning that
warns when a macro expands to more text than a configurable size.

[Bug testsuite/116080] New tests from r15-2233-g8d1af8f904a0c0 fail

2024-07-25 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116080

--- Comment #6 from Andi Kleen  ---
Created attachment 58761
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58761&action=edit
Improve test suite tail call checks

This patch should fix it. We must run the test suite tail call probes without
optimization to match the actual tests. Also I generalized the general powerpc
checks to a new external_tail_call target probe.

[Bug c++/116019] Incorrect cannot-tail messages on targets

2024-07-28 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116019

Andi Kleen  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andi Kleen  ---
Made the message more vague in trunk.

[Bug tree-optimization/116126] New: vectorize libcpp search_line_fast

2024-07-29 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116126

Bug ID: 116126
   Summary: vectorize libcpp search_line_fast
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

This is somewhat of a metabug to track vectorization of libcpp/lex.c
search_line_fast, which currently has manual vectorization for various
architectures. It would be better if gcc could just figure it out by itself.

The definition is:

// Find any occurrence for \n \r \\ ? and return pointer to it. 
// Can assume that the string ends with \n, so end can be ignored

const unsigned char *search_line_fast (const unsigned char *s, const unsigned
char *end)
{
for (;;) {
if (*s == '\n' || *s == '\r' || *s == '\\' || *s == '?')
return s;
s++;
}
}

currently this fails due to

- Vectorizer cannot determine number of iterations
- Early return is not supported
- if to switch creates an unsupported switch

If we don't ignore end to work around the uncountable problem we get:

const unsigned char *search_line_fast2 (const unsigned char *s, const unsigned
char *end)
{
while (s < end) {
if (*s == '\n' || *s == '\r' || *s == '\\' || *s == '?')
return s;
s++;
}
}

[Bug c++/116019] Incorrect cannot-tail messages on targets

2024-07-29 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116019

Andi Kleen  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|FIXED   |---

--- Comment #3 from Andi Kleen  ---
Patch was reverted, it just made a bunch of tests unsupported.

problems:
- Need unique name for each new test to not confuse the caching
- -O0 tests need to use musttail explictly because the musttail pass
only handles musttail annotated returns.

[Bug c++/116019] Incorrect cannot-tail messages on targets

2024-07-29 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116019

Andi Kleen  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Andi Kleen  ---
Reopened the wrong bug

[Bug testsuite/116080] [15 regression] New tests from r15-2233-g8d1af8f904a0c0 fail

2024-07-29 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116080

--- Comment #8 from Andi Kleen  ---
Patch was reverted, it just made a bunch of tests unsupported.

problems:
- Need unique name for each new test to not confuse the caching
- -O0 tests need to use musttail explictly because the musttail pass
only handles musttail annotated returns.

[Bug testsuite/116080] [15 regression] New tests from r15-2233-g8d1af8f904a0c0 fail

2024-07-29 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116080

Andi Kleen  changed:

   What|Removed |Added

  Attachment #58761|0   |1
is obsolete||

--- Comment #9 from Andi Kleen  ---
Created attachment 58773
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58773&action=edit
New test suite fixes

[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0

2024-10-13 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #6 from Andi Kleen  ---
There are multiple issues (should probably rename the subject)

Apart from the inefficient bit test, the jump_table clustering is also very
inefficient because it tries every possible cluster combination

  unsigned l = clusters.length ();
...
  for (unsigned i = 1; i <= l; i++)
{
 ...

  for (unsigned j = 0; j < i; j++)

There must be a better algorithm for this? Perhaps it needs dynamic
programming?

Also in addition the function that creates the tree is recursive and can have
quite deep recursion (should probably fix that one too)

>I wouldn't start with disabling this at -O0 and -O1 ;)

Even with a better algorithm that's imho the right thing to do.

[Bug other/116970] New: -ftime-report -fdiagnostics-format=sarif-file causes ICE

2024-10-04 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116970

Bug ID: 116970
   Summary: -ftime-report -fdiagnostics-format=sarif-file causes
ICE
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

This is with tramp3d, but I suspect it will happen on other files too.

% ./gcc/cc1plus -ftime-report -fdiagnostics-format=sarif-file tramp3d-v4.i
cc1plus: internal compiler error: Segmentation fault
0x27206ee internal_error(char const*, ...)
../../gcc/gcc/diagnostic-global-context.cc:517
0x133401f crash_signal
../../gcc/gcc/toplev.cc:321
0x27e7934 htab_hash_string
../../gcc/libiberty/hashtab.c:838
0x2715dde string_hash::hash(char const*)
../../gcc/gcc/hash-traits.h:239
0x2715dde simple_hashmap_traits,
sarif_artifact*>::hash(char const* const&)
../../gcc/gcc/hash-map-traits.h:50
0x2715dde hash_map, sarif_artifact*>
>::get(char const* const&)
../../gcc/gcc/hash-map.h:191
0x2715dde ordered_hash_map, sarif_artifact*>
>::get(char const* const&)
../../gcc/gcc/ordered-hash-map.h:76
0x2715dde sarif_builder::get_or_create_artifact(char const*,
diagnostic_artifact_role, bool)
../../gcc/gcc/diagnostic-format-sarif.cc:2892
0x2716403 sarif_output_format::sarif_output_format(diagnostic_context&,
line_maps const*, char const*, bool)
../../gcc/gcc/diagnostic-format-sarif.cc:3154
0x2716403
sarif_file_output_format::sarif_file_output_format(diagnostic_context&,
line_maps const*, char const*, bool, char const*)
../../gcc/gcc/diagnostic-format-sarif.cc:3193
0x2716403 std::enable_if::value,
std::unique_ptr > >::type
make_unique(diagnostic_context&, line_maps const*&, char
const*&, bool&, char const*&)
0x2716403 diagnostic_output_format_init_sarif_file(diagnostic_context&,
line_maps const*, char const*, bool, char const*)
../../gcc/gcc/diagnostic-format-sarif.cc:3392
0x26f0522 common_handle_option(gcc_options*, gcc_options*, cl_decoded_option
const*, unsigned int, int, unsigned int, cl_option_handlers const*,
diagnostic_context*, void (*)())
../../gcc/gcc/opts.cc:2968
0x26f5728 handle_option
../../gcc/gcc/opts-common.cc:1316
0x26f585e read_cmdline_option(gcc_options*, gcc_options*, cl_decoded_option*,
unsigned int, unsigned int, cl_option_handlers const*, diagnostic_context*)
../../gcc/gcc/opts-common.cc:1646
0x120f194 read_cmdline_options
../../gcc/gcc/opts-global.cc:242
0x120f194 decode_options(gcc_options*, gcc_options*, cl_decoded_option*,
unsigned int, unsigned int, diagnostic_context*, void (*)())
../../gcc/gcc/opts-global.cc:329

[Bug middle-end/117091] switch clustering takes extensive time with large switches even at -O0

2024-10-14 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117091

--- Comment #9 from Andi Kleen  ---
Yes I guess we should keep better switches at -O1 because machine generated
code may have lot of switches.

I don't think we need perfect clustering? Perhaps there is some heuristic that
is good enough. Maybe just kmeans or something like that.

The deep recursion I saw was for balance_case_nodes

[Bug preprocessor/118168] -Wmisleading-indentation causes 10x+ overhead or higher (visible on mypy-1.13.0)

2024-12-26 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118168

--- Comment #11 from Andi Kleen  ---
Fix posted here:

https://inbox.sourceware.org/gcc-patches/20241227024559.2224623-1-a...@firstfloor.org/T/#t

[Bug tree-optimization/118211] New: tree-vectorize: vectorize input.cc, find_end_of_line

2024-12-26 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118211

Bug ID: 118211
   Summary: tree-vectorize: vectorize input.cc, find_end_of_line
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

This is the hot loop of the line searching function in gcc input.cc.
It currently fails to vectorize on AVX512F. Would be nice if it could.

% gcc -O3 -mavx512f -fopt-info -fopt-info-all  eol.cc -S
Unit growth for small function inlining: 18->18 (0%)

Inlined 0 calls, eliminated 0 functions

BB 3 is always executed in loop 1
loop 1's coldest_outermost_loop is 1, hotter_than_inner_loop is NULL
eol.cc:6:36: missed: couldn't vectorize loop
eol.cc:8:11: missed: can't safely apply code motion to dependencies of _1 =
*s_12;
 to vectorize the early exit.
eol.cc:4:1: note: vectorized 0 loops in function.
eol.cc:23:36: note: * Analysis failed with vector mode V32QI
BB 3 is always executed in loop 1
loop 1's coldest_outermost_loop is 1, hotter_than_inner_loop is NULL
eol.cc:6:36: note: considering unrolling loop 1 at BB 5
considering unrolling loop with constant number of iterations
considering unrolling loop with runtime-computable number of iterations

#include 

const char *
find_end_of_line (const char *s, size_t len)
{
  for (const auto end = s + len; s != end; ++s)
{
  if (*s == '\n')
return s;
  if (*s == '\r')
{
  const auto next = s + 1;
  if (next == end)
{
  break;
}
  return (*next == '\n' ? next : s);
}
}
  return nullptr;
}

[Bug target/118251] New: i386: Use carry bits of shifts

2024-12-30 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118251

Bug ID: 118251
   Summary: i386: Use carry bits of shifts
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Inspired by https://github.com/komrad36/CRC

Even though gcc has CRC pattern matching now which should be implemented on x86
too, it would be still good if it handled the manual coded versions well too.
What the compiler currently generates is not necessarily slower on a modern
super scalar CPU that is not frontend bound, but it is larger

#include 
#include 

uint8_t right_shift(uint8_t v)
{
bool rightmost_bit_set = v & 1;
v >>= 1;
if (rightmost_bit_set)
v ^= 0x11EDC6F41;
return v;
}

uint8_t left_shift(uint8_t v)
{
bool leftmost_bit_set = v & 0x80;
v <<= 1;
if (leftmost_bit_set)
v ^= 0x11EDC6F41;
return v;
}

generates

  movl%edi, %eax
shrb%dil
andl$1, %eax
negl%eax
andl$65, %eax
xorl%edi, %eax

...
leal(%rdi,%rdi), %eax
movl%eax, %edx
xorl$65, %edx
testb   %dil, %dil
cmovs   %edx, %eax

But on x86 the carry bit contains the shift out bit, so it could be optimized
to
directly use that without the extra test

   mov  %edi,%eax
   xor  $constant,%edi
   shrb $1,%eax
   cmovc %edi,%eax

and similarly for the left shift case. With APX the code could be even more
efficient.

I presume two peepholes could do it.

[Bug target/118252] New: i386 should implement CASE_VECTOR_SHORTEN_MODE

2024-12-30 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118252

Bug ID: 118252
   Summary: i386 should implement CASE_VECTOR_SHORTEN_MODE
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Inspired by https://github.com/komrad36/CRC

When generating a jump table for switch gcc always uses .long for PIC or .quad
for non PIC. This both wastes code size and dcache foot print. The offset jump
sequence is slightly longer, but shortening the table makes more than up for
it.

First it should probably always use .long to save dcache, but if the switch
code fits into 16k or 8k it should use .word or .byte. gcc already has an
optimization pass for this as part of jump shortening. But it requires
implementing the 
CASE_VECTOR_SHORTEN_MODE target macro which is used by some other targets, but
not x86.

[Bug tree-optimization/118250] missed optimization in multiple integer comparisons (like errno tests)

2024-12-30 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118250

--- Comment #2 from Andi Kleen  ---
With 

--param=switch-lower-slow-alg-max-cases=1

(so using greedy) trunk includes "38" in the first bit cluster, but the LLVM
code is still better. I've seen the dynamic programing algorithm miss clusters
like this before.


movl(%rdi), %eax
cmpl$38, %eax
jg  .L2
cmpl$1, %eax
jle .L4
movabsq $274949472260, %rdx
btq %rax, %rdx
setc%al
ret
.p2align 4,,10
.p2align 3
.L2:
cmpl$95, %eax
sete%al
ret
.p2align 4,,10
.p2align 3
.L4:
xorl%eax, %eax
ret

[Bug preprocessor/118168] -Wmisleading-indentation causes 10x+ overhead or higher (visible on mypy-1.13.0)

2025-02-07 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118168

Andi Kleen  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #19 from Andi Kleen  ---
All the fixes went in (except for the vectorization of the line search
function, but with the data structure improvements it's unclear it is still
needed)

[Bug preprocessor/118168] -Wmisleading-indentation causes 10x+ overhead or higher (visible on mypy-1.13.0)

2025-02-02 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118168

--- Comment #17 from Andi Kleen  ---
With the patches now in trunk the overhead for enabling
-Wmisleading-indentation is now ~32% unless --param=file-cache-lines=1 is
used. With the drop behind cache it would be noise.

[Bug preprocessor/118168] -Wmisleading-indentation causes 10x+ overhead or higher (visible on mypy-1.13.0)

2024-12-22 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118168

--- Comment #6 from Andi Kleen  ---
So the file cache has a window of 100 lines:

static const size_t line_record_size = 100;

The indentation code rereads the line of the guard, body, next statement and
that is all cached if it's all within 100 lines of where the lexer is.

But for some reason in your code the lexer is often very far ahead, e.g. in one
example it was 60k lines ahead. 

So that means the file cache reads the previous window, and then move forward
again, which is lots of extra reading. It also reads the full 100 lines.

(I think sometimes when it is very far away it may even have lost the line
offset and needs to read even more)

Also it's strange that ferror is expensive (and feof is not)?

I'm not fully sure why the lexer is so far ahead. Maybe there is lots of
peeking somewhere?

A fix would be to always reread the line when the parser is in the right spot
and remember the indentation until the end of the statement to check.

or use mmap on the whole file to make it a lot cheaper (but would still need a
line table)

[Bug preprocessor/118168] -Wmisleading-indentation causes 10x+ overhead or higher (visible on mypy-1.13.0)

2024-12-22 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118168

--- Comment #7 from Andi Kleen  ---
Actually in my case where i interrupted and the difference was 60k i think the
problem was that the lexer offset was beyond the 100 lines where the position
is cached, and when that happens the file_cache just starts reading from the
beginning of the file again and that burns all the CPU time.

(and that can repeat many times as the position moves forward and backwards)

So the problem is really why is the lexer so far ahead of the parser.

But also it's a general problem, even if it wasn't ahead e.g. if someone has a
100 line comment somewhere in the middle of a statement it could cause this
behavior too.

[Bug preprocessor/118168] -Wmisleading-indentation causes 10x+ overhead or higher (visible on mypy-1.13.0)

2024-12-23 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118168

--- Comment #10 from Andi Kleen  ---
My earlier analysis was wrong.  The file cache is exactly supposed to avoid
this quadratic case.

But the cache only works if the linemap knows the total number of lines,
otherwise it uses a much slower fallback method
which ends up reparsing the line boundaries all the time and wastes all
the CPU.

But that's not the case for C because the parser runs before the file is fully
lexed.

  /* This is the total number of lines of the current file.  At the
 moment, we try to get this information from the line map
 subsystem.  Note that this is just a hint.  When using the C++
 front-end, this hint is correct because the input file is then
 completely tokenized before parsing starts; so the line map knows
 the number of lines before compilation really starts.  For e.g,
 the C front-end, it can happen that we start emitting diagnostics
 before the line map has seen the end of the file.  */

I think you wouldn't see the problem with C++.

I'm not fully sure why the cache really needs the maximum number of lines,
maybe it could work without it.

Or there's already some code in input.cc to update m_total_lines from the input
file without requiring the linemap, but it's not used in the m indentation
diagnostics case.

Also it would probably be a good idea to add some tunables for the caches
(although with the problem above this doesn't fix the issue)

[Bug preprocessor/118168] -Wmisleading-indentation causes 10x+ overhead or higher (visible on mypy-1.13.0)

2024-12-23 Thread andi-gcc at firstfloor dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118168

--- Comment #9 from Andi Kleen  ---
Created attachment 59954
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59954&action=edit
add tunables for file cache

  1   2   >