[Bug target/114160] ICE on RISCV (-mcpu=thead-c906) when building glibc in dwarf2out_frame_debug_cfa_offset

2024-03-18 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114160

Christoph Müllner  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-03-18

--- Comment #4 from Christoph Müllner  ---
I now have permission. Thanks Sam!

[Bug target/114160] ICE on RISCV (-mcpu=thead-c906) when building glibc in dwarf2out_frame_debug_cfa_offset

2024-03-18 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114160

Christoph Müllner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Christoph Müllner  ---
Closing as fixed.

[Bug target/114194] ICE when using std::unique_ptr with xtheadvector

2024-03-21 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194

Christoph Müllner  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org
   Last reconfirmed||2024-03-21
 CC||cmuellner at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Target Milestone|--- |14.0

--- Comment #7 from Christoph Müllner  ---
Thanks for reporting and providing several minimal reproducers.

I can reproduce the issue and have further analyzed it.
During the analysis, I've noticed that not only memset-zero (clear-memory) is
affected,
but all memset expansions (e.g. `memset(p, 3, 15)`).

I also have a potential fix that will be sent to the list once the testing run
is completed.

[Bug target/114194] ICE when using std::unique_ptr with xtheadvector

2024-03-22 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114194

Christoph Müllner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #8 from Christoph Müllner  ---
Closing as resolved (the fix has been pushed on master).

[Bug target/116131] [14/15 Regression] RISC-V: Unrecognizable insn with xtheadmemidx on rv32

2024-07-30 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116131

--- Comment #3 from Christoph Müllner  ---
After passing the tests, I've posted the patch on the mailing list:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658726.html

[Bug target/116033] [14 only] RISC-V: -march=rv64gv_xtheadmemidx generates illegal vse8.v insn

2024-08-05 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116033

Christoph Müllner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Christoph Müllner  ---
The fix was accepted and has been pushed to master on Jul 25:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a86c0cb9379e7b86625908a0250cf698276e9e02

For GCC 14 we had to wait until GCC 14.2 was released and the backport has just
been pushed on releases/gcc-14:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=eccf707e5ceb7e405ffe4edfbcae2f769b8386cf

Closing this ticket as resolved.

Patrick, thank you for reporting!

[Bug target/116131] [14/15 Regression] RISC-V: Unrecognizable insn with xtheadmemidx on rv32

2024-08-07 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116131

--- Comment #5 from Christoph Müllner  ---
I've prepared a patchset that eliminates the optimization patterns for
XThead(F)MemIdx, which produce the non-canonical MEMs. As a side-effect, this
change also fixes the issue reported here. However, it also triggers another
ICE (in the case of enabled XThead(F)MemIdx and XTheadFmv/Zfa), which is
addressed in the last patch of the series:

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659676.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659677.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659678.html

[Bug rtl-optimization/116353] [15 Regression] ICE on glibc-2.39: RTL pass: ce2, in expand_simple_binop, at optabs.cc:1264 since r15-2890-g72c9b5f438f22c

2024-08-13 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116353

--- Comment #7 from Christoph Müllner  ---
> To add on to the info provided by Manolis, this is the diff for the proposed
> fix:
> 
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 3e25f30b67e..da59c907891 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3938,8 +3938,10 @@ bb_ok_for_noce_convert_multiple_sets (basic_block
> test_bb, unsigned *cost)
>rtx src = SET_SRC (set);
>  
>/* Do not handle anything involving memory loads/stores since it might
> -violate data-race-freedom guarantees.  */
> -  if (!REG_P (dest) || contains_mem_rtx_p (src))
> +violate data-race-freedom guarantees.  Make sure we can force SRC
> +to a register as that may be needed in try_emit_cmove_seq.  */
> +  if (!REG_P (dest) || contains_mem_rtx_p (src)
> + || !noce_can_force_operand (src))
> return false;
>  
>/* Destination and source must be appropriate.  */

I've successfully bootstrapped the proposed change on top of master with
`--enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto` on x86-64 and
aarch64.

So the change is:
  Tested-by: Christoph Müllner 

[Bug rtl-optimization/116349] [15 regression] ICE in expand_simple_binop, at optabs.cc:1264 when building libgo

2024-08-13 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116349

--- Comment #7 from Christoph Müllner  ---
(In reply to seurer from comment #6)
> I am seeing this same failure in doing a bootstrap build during stage 2 on
> powerpc64:

A fix that is confirmed to work on AArch64 and x86-64 has been posted here (see
PR116353):
  https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660245.html

[Bug target/114673] New: RISC-V: "L" constraint cannot be used for lui in inline asm

2024-04-09 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114673

Bug ID: 114673
   Summary: RISC-V: "L" constraint cannot be used for lui in
inline asm
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cmuellner at gcc dot gnu.org
  Target Milestone: ---

The RISC-V-specific "L" constraint is neither documented nor tested.
In constraints.md it is defined as "A U-type 20-bit signed immediate.".
It tests if the value is a constant int that satisfies LUI_OPERAND(),
i.e. a value with the lowest 12 bits zero.

One obvious use-case is to use "L" for "lui" in inline asm.
However, it does not work as expected:

long getB()
{
//lui a0,0x1800
return 3<<23; //0x0180
}

long getB_asm_i()
{
long reg;
//lui a0,0x1800
asm("lui %0, %1" : "=r"(reg) : "i"((3<<23) >> 12));
return reg;
}

long getB_asm_L()
{
long reg;
//Assembler error: lui expression not in range 0..1048575
asm("lui %0, %1" : "=r"(reg) : "L"(3ul<<23));
return reg;
}

long getB_asm_Lshift()
{
long reg;
//Compiler error: impossible constraint in 'asm'
asm("lui %0, %1" : "=r"(reg) : "L"((3<<23) >> 12));
return reg;
}

The "L" constraint was introduced as part of the initial RISC-V port.
I could not find any tests/documentation, so I am unsure if it can be fixed
or if a new constraint should be introduced.

My preferred fix would be to shift the provided constant right by 12
if it satisfies LUI_OPERAND(), so that getB_asm_L() would work.

[Bug middle-end/111111] omnetpp: ICEs with dump flags, PGO and LTO

2024-05-03 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11

Christoph Müllner  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Christoph Müllner  ---
This can't be reproduced anymore (retested with master and releases/gcc-14).

[Bug target/111501] RISC-V: non-optimal casting when shifting

2024-05-06 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111501

Christoph Müllner  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org
 CC||cmuellner at gcc dot gnu.org

--- Comment #3 from Christoph Müllner  ---
I noticed this a while ago as well (when working on the XTheadB* stuff).
This can be addressed with an insn_and_split for zero_extract.
I even wrote a patch for that back then, but forgot to send it out.
I've rebased/retested it now and will send it once the release is out.

Btw, LLVM is catching all of these cases.

[Bug target/111501] RISC-V: non-optimal casting when shifting

2024-05-16 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111501

Christoph Müllner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Christoph Müllner  ---
Closing this, as it has been fixed on master.

[Bug rtl-optimization/115344] New: Missing loop counter reversal

2024-06-04 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115344

Bug ID: 115344
   Summary: Missing loop counter reversal
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cmuellner at gcc dot gnu.org
  Target Milestone: ---

Let's take a simple for-loop with an unknown bound:

void bar ();
void foo1 (int n) {
for (int i = 0; i < n; i++) {
bar ();
}
}

We see that two variables are in the program,
but we could eliminate the loop variable `i` as follows:

void bar ();
void foo2 (int n) {
while (n) {
bar ();
n--;
}
}

Optimizing the loop as above has the following benefits:
- No need for a register for the loop variable `i`
- No need for an additional slot in the stack frame
- No need for instructions to save/restore the loop variable register in the
prologue/epilogue
- No need for an initialization instruction for the loop variable `i` (to zero)

LLVM does this transformation on (at least) x86-64,  RISC-V (rv64gc),
and AArch64 with -O3, but GCC does not.
Tests have been done with trunk and older GCC releases (I've tested down to GCC
4.4).

Related bug tickets:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22041 (open - with uses of the
loop counter as an array index)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31238 (closed - fixed for GCC
4.5.0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40886 (closed - fixed for GCC
4.5.0)

GCC AArch64 -O3:
foo1:
cmp w0, 0
ble .L6
stp x29, x30, [sp, -32]!
mov x29, sp
stp x19, x20, [sp, 16]
mov w20, w0
mov w19, 0
.L3:
add w19, w19, 1
bl  bar
cmp w20, w19
bne .L3
ldp x19, x20, [sp, 16]
ldp x29, x30, [sp], 32
ret
.L6:
ret
foo2:
cbz w0, .L18
stp x29, x30, [sp, -32]!
mov x29, sp
str x19, [sp, 16]
mov w19, w0
.L12:
bl  bar
subsw19, w19, #1
bne .L12
ldr x19, [sp, 16]
ldp x29, x30, [sp], 32
ret
.L18:
ret

LLVM AArch64 -O3:
foo1:   // @foo1
cmp w0, #1
b.lt.LBB0_4
stp x29, x30, [sp, #-32]!   // 16-byte Folded Spill
str x19, [sp, #16]  // 8-byte Folded Spill
mov x29, sp
mov w19, w0
.LBB0_2:// =>This Inner Loop Header: Depth=1
bl  bar
subsw19, w19, #1
b.ne.LBB0_2
ldr x19, [sp, #16]  // 8-byte Folded Reload
ldp x29, x30, [sp], #32 // 16-byte Folded Reload
.LBB0_4:
ret
foo2:   // @foo2
cbz w0, .LBB1_4
stp x29, x30, [sp, #-32]!   // 16-byte Folded Spill
str x19, [sp, #16]  // 8-byte Folded Spill
mov x29, sp
mov w19, w0
.LBB1_2:// =>This Inner Loop Header: Depth=1
bl  bar
subsw19, w19, #1
b.ne.LBB1_2
ldr x19, [sp, #16]  // 8-byte Folded Reload
ldp x29, x30, [sp], #32 // 16-byte Folded Reload
.LBB1_4:
ret

GCC RISC-V -O3 -march=rv64gc:
foo1:
ble a0,zero,.L6
addisp,sp,-32
sd  s0,16(sp)
sd  s1,8(sp)
sd  ra,24(sp)
mv  s1,a0
li  s0,0
.L3:
addiw   s0,s0,1
callbar
bne s1,s0,.L3
ld  ra,24(sp)
ld  s0,16(sp)
ld  s1,8(sp)
addisp,sp,32
jr  ra
.L6:
ret
foo2:
beq a0,zero,.L18
addisp,sp,-16
sd  s0,0(sp)
sd  ra,8(sp)
mv  s0,a0
.L12:
addiw   s0,s0,-1
callbar
bne s0,zero,.L12
ld  ra,8(sp)
ld  s0,0(sp)
addisp,sp,16
jr  ra
.L18:
ret

LLVM RISC-V -O3 -march=rv64gc
foo1:   # @foo1
bleza0, .LBB0_4
addisp, sp, -16
sd  ra, 8(sp)   # 8-byte Folded Spill
sd  s0, 0(sp)   # 8-byte Folded Spill
mv  s0, a0
.LBB0_2:# =>This Inner Loop Header: Depth=1
callbar
addiw   s0, s0, -1
bnezs0, .LBB0_2
ld  ra, 8(sp)   # 8-byte Folded Reload
ld  s0, 0(sp)   # 8-byte Folded Reload
addisp, sp, 16
.LBB0_4:
ret
foo2:   # @foo2
beqza0, .LBB1_4
addisp, sp

[Bug target/115554] New: RISC-V: ICE in case of multiple target-arch attributes

2024-06-20 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115554

Bug ID: 115554
   Summary: RISC-V: ICE in case of multiple target-arch attributes
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cmuellner at gcc dot gnu.org
  Target Milestone: ---

Minimal reproducers (for target-arch):

extern
__attribute__((target("arch=+zba")))
__attribute__((target("arch=+zbb")))
void foo(void);

extern
__attribute__((target("arch=+zbb")))
__attribute__((target("arch=+zbb")))
void bar(void);

The ICE is a bug.
If multiple target-arch attributes should not be allowed,
then an error message is the right solution.

Allowing multiple target-X attributes is problematic, as can be seen for baz().
I.e., does the second attribute amend or replace the previous one?
However, accepting multiple target-X attributes if they are equal (like for
bar)
could be done.

The assertion was added in the GCC 14 cycle (commit 9941f0295a1).
GCC 14 and 15 are affected. GCC 13 is not affected (we don't have RISC-V
target-arch
attributes in GCC 13).

The ICE looks like this:
$ riscv64-unknown-linux-gnu-gcc bar.c -c
bar.c:4:1: internal compiler error: in riscv_func_target_put, at
common/config/riscv/riscv-common.cc:521
4 | void foo(void);
  | ^~~~
0xc15306 riscv_func_target_put(tree_node*, std::__cxx11::basic_string, std::allocator >)
   
/home/cm/src/gcc/riscv-mainline/gcc/common/config/riscv/riscv-common.cc:521
0x18234d3 riscv_process_target_attr
   
/home/cm/src/gcc/riscv-mainline/gcc/config/riscv/riscv-target-attr.cc:370
0x182334c riscv_process_target_attr
   
/home/cm/src/gcc/riscv-mainline/gcc/config/riscv/riscv-target-attr.cc:314
0x182363d riscv_option_valid_attribute_p(tree_node*, tree_node*, tree_node*,
int)
   
/home/cm/src/gcc/riscv-mainline/gcc/config/riscv/riscv-target-attr.cc:389
0xd560ee handle_target_attribute
/home/cm/src/gcc/riscv-mainline/gcc/c-family/c-attribs.cc:5915
0xc24d04 decl_attributes(tree_node**, tree_node*, int, tree_node*)
/home/cm/src/gcc/riscv-mainline/gcc/attribs.cc:900
0xc2bbed c_decl_attributes
/home/cm/src/gcc/riscv-mainline/gcc/c/c-decl.cc:5501
0xc43b77 start_decl(c_declarator*, c_declspecs*, bool, tree_node*, bool,
unsigned int*)
/home/cm/src/gcc/riscv-mainline/gcc/c/c-decl.cc:5647
0xcb4d73 c_parser_declaration_or_fndef
/home/cm/src/gcc/riscv-mainline/gcc/c/c-parser.cc:2773
0xcc158b c_parser_external_declaration
/home/cm/src/gcc/riscv-mainline/gcc/c/c-parser.cc:2053
0xcc1fb5 c_parser_translation_unit
/home/cm/src/gcc/riscv-mainline/gcc/c/c-parser.cc:1907
0xcc1fb5 c_parse_file()
/home/cm/src/gcc/riscv-mainline/gcc/c/c-parser.cc:27303
0xd3b9c1 c_common_parse_file()
/home/cm/src/gcc/riscv-mainline/gcc/c-family/c-opts.cc:1322

[Bug target/115554] RISC-V: ICE in case of multiple target-arch attributes

2024-06-20 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115554

--- Comment #1 from Christoph Müllner  ---
Forgot to mention:
The ICE is triggered by an assertion in riscv_func_target_put(), which ensures
we don't have more than one target-arch attribute in one function declaration.

[Bug target/115562] New: RISC-V: ICE because of reused fndecl with target-arch attribute

2024-06-20 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115562

Bug ID: 115562
   Summary: RISC-V: ICE because of reused fndecl with target-arch
attribute
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cmuellner at gcc dot gnu.org
  Target Milestone: ---

Minimal (?) reproducer:

$ cat foo-copy.c 
void foo (void);

__attribute__((target("arch=+zbb")))
void*
memcpy (void *d, const void *, unsigned long)
{
  return d;
}
__attribute__((target("arch=+zbb"))) void fun0(void) {}
__attribute__((target("arch=+zbb"))) void fun1(void) {}
__attribute__((target("arch=+zbb"))) void fun2(void) {}
__attribute__((target("arch=+zbb"))) void fun3(void) {}
__attribute__((target("arch=+zbb"))) void fun4(void) {}
__attribute__((target("arch=+zbb"))) void fun5(void) {}
__attribute__((target("arch=+zbb"))) void fun6(void) {}
__attribute__((target("arch=+zbb"))) void fun7(void) {}
__attribute__((target("arch=+zbb"))) void fun8(void) {}
__attribute__((target("arch=+zbb"))) void fun9(void) {}
__attribute__((target("arch=+zbb"))) void fun10(void) {}
__attribute__((target("arch=+zbb"))) void fun11(void) {}
__attribute__((target("arch=+zbb"))) void fun12(void) {}

This is similar to PR115554, but triggers the assertion in
riscv_func_target_put()
because when processing `fun12` the fndecl is equal to the previously processed
fndecl of `memcpy`. I.e., the assumption that the fndecl pointer can be used as
an identifier (or comparable for the hash-table) does not hold.

Like PR115554, this bug is part of GCC14 and on the master branch.

The ICE looks the same as for PR115554 (the same assertion is triggered).
To analyze this issue, I've extended riscv_func_target_put() like this:

+  if (*target_info_slot)
+{
+  inform (loc, "Hash collision detected:");
+  inform (loc, "  old function: %qE (%p)", (*target_info_slot)->fn_decl,
(*target_info_slot)->fn_decl);
+  inform (loc, "  old attributes: %s",
(*target_info_slot)->fn_target_name.c_str());
+  inform (loc, "  new function: %qE", fn_decl);
+  inform (loc, "  new attributes: %s", fn_target_name.c_str ());
+}
+  else
+{
+  inform (loc, "Adding target attributes to function:");
+  inform (loc, "  new function: %qE (%p)", fn_decl, fn_decl);
+  inform (loc, "  new attributes: %s", fn_target_name.c_str ());
+}
   gcc_assert (!*target_info_slot);

Additionally, I've included tree.h and added "location_t loc" as parameter
of this function. This gives the following output on the reproducer above:

$ /opt/riscv-mainline/bin/riscv64-unknown-linux-gnu-gcc -c foo-copy.c
foo-copy.c:5:1: note: Adding target attributes to function: 
5 | memcpy (void *d, const void *, unsigned long)   
  | ^~
foo-copy.c:5:1: note:   new function: 'memcpy' (0x7f295879e200)  // first
appearance
foo-copy.c:5:1: note:   new attributes:
rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zbb1p0
 
foo-copy.c:10:43: note: Adding target attributes to function:
   10 | __attribute__((target("arch=+zbb"))) void fun0(void) {}
  |   ^~~~
foo-copy.c:10:43: note:   new function: 'fun0' (0x7f295879e400)
foo-copy.c:10:43: note:   new attributes:
rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zbb1p0
[...]
foo-copy.c:22:43: note: Adding target attributes to function:
   22 | __attribute__((target("arch=+zbb"))) void fun11(void) {}
  |   ^
foo-copy.c:22:43: note:   new function: 'fun11' (0x7f295879ef00)
foo-copy.c:22:43: note:   new attributes:
rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zbb1p0
foo-copy.c:23:43: note: Hash collision detected:
   23 | __attribute__((target("arch=+zbb"))) void fun12(void) {}
  |   ^
foo-copy.c:23:43: note:   old function: 'fun12' (0x7f295879e200)  // same
address!
foo-copy.c:23:43: note:   old attributes:
rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zbb1p0
foo-copy.c:23:43: note:   new function: 'fun12'
foo-copy.c:23:43: note:   new attributes:
rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zbb1p0
foo-copy.c:23:1: internal compiler error: in riscv_func_target_put, at
common/config/riscv/riscv-common.cc:536
   23 | __attribute__((target("arch=+zbb"))) void fun12(void) {}
  | ^

As can be seen in the example above, fndecl of `memcpy` has the address
0x7f295879e200, which is equal to the address of fndecl of `fun12`.

Note that even small adjustments to the source will break the reproducer.
Therefore, I could not rename `memcpy` to something different.

[Bug target/115562] RISC-V: ICE because of reused fndecl with target-arch attribute

2024-06-20 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115562

--- Comment #1 from Christoph Müllner  ---
This issue was discovered while analyzing a build issue with a patchset
to introduce optimized string processing routines for RISC-V in glibc.

See also:
  https://sourceware.org/pipermail/libc-alpha/2024-June/157627.html

[Bug target/115554] RISC-V: ICE in case of multiple target-arch attributes

2024-07-16 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115554

Christoph Müllner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Christoph Müllner  ---
Fixed upstream with:
*
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aa8e2de78cae4dca7f9b0efe0685f3382f9ecb9a
*
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=61c21a719e205f70bd046c6a0275d1a3fd6341a4

Backported to GCC-14:
*
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=0e1f599d637668bba0b2890f4cd81e7fb70473bc
*
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=b3cff8357e9dce680a20406698fa9dadfe04997d

[Bug target/115562] RISC-V: ICE because of reused fndecl with target-arch attribute

2024-07-16 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115562

Christoph Müllner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Christoph Müllner  ---
Fixed upstream with:
*
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aa8e2de78cae4dca7f9b0efe0685f3382f9ecb9a
*
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=61c21a719e205f70bd046c6a0275d1a3fd6341a4

Backported to GCC-14:
*
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=0e1f599d637668bba0b2890f4cd81e7fb70473bc
*
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=b3cff8357e9dce680a20406698fa9dadfe04997d

[Bug target/116035] [14/15] RISC-V: -march=rv64g_xtheadmemidx_zba generates illegal lwu insn

2024-07-23 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116035

Christoph Müllner  changed:

   What|Removed |Added

   Last reconfirmed||2024-07-23
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Christoph Müllner  ---
Thanks for reporting.
Seems like a Zba INSN is matching and causing some troubles.
I'll prepare a fix.

[Bug target/116035] [14/15] RISC-V: -march=rv64g_xtheadmemidx_zba generates illegal lwu insn

2024-07-24 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116035

--- Comment #2 from Christoph Müllner  ---
Proposed fix has been posted on the mailing list:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658091.html

[Bug target/116035] [14/15] RISC-V: -march=rv64g_xtheadmemidx_zba generates illegal lwu insn

2024-07-24 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116035

--- Comment #4 from Christoph Müllner  ---
I understood this as GCC 14 and 15 (i.e., master) show this issue.

Testing with GCC 13 shows:
  error: '-march=rv64g_xtheadmemidx_zba': unexpected ISA string at end: 'zba'
The issue does not apply to GCC 13 or older because the affected extensions
were not supported back then.

By the way thanks for reminding me of the GCC 14 backport.

[Bug target/116033] [14/15] RISC-V: -march=rv64gv_xtheadmemidx generates illegal vse8.v insn

2024-07-24 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116033

Christoph Müllner  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-07-24
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org

--- Comment #1 from Christoph Müllner  ---
I've prepared a patch that disables pre-/post-modify addressing if RVV is
enabled:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658119.html

The underlying issue is outlined in the commit message.
We are confronted with the following optimization from auto_inc_dec (-O3),
when RVV and XTheadMemIdx are enabled:
```
(insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM  [(char *)_39]+0 S4 A32])
(reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27
3183 {*movv4qi}
 (nil))
(insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ])
(plus:DI (reg:DI 136 [ ivtmp.13 ])
(const_int 20 [0x14]))) 5 {adddi3}
 (nil))
>
(insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ])
(plus:DI (reg:DI 136 [ ivtmp.13 ])
(const_int 20 [0x14]))) [0 MEM  [(char
*)_39]+0 S4 A32])
(reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27
3183 {*movv4qi}
 (expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ])
(nil)))
```

One solution would be to introduce a target hook to check if a certain
type can be used for pre-/post-modify optimizations.
However, it will be hard to justify such a hook if only a single
RISC-V vendor extension requires that.
Therefore, this patch takes a more drastic approach and disables
pre-/post-modify addressing if TARGET_VECTOR is set.
This results in not emitting pre-/post-modify instructions from
XTheadMemIdx if RVV is enabled.

[Bug target/116033] [14/15] RISC-V: -march=rv64gv_xtheadmemidx generates illegal vse8.v insn

2024-07-24 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116033

--- Comment #2 from Christoph Müllner  ---
Jeff Law claimed that th_classify_address() is likely missing a mode check.
I checked that before, and there is a mode check there.
But, after this comment, I challenged the test and indeed:
  if (!(INTEGRAL_MODE_P (mode) && GET_MODE_SIZE (mode).to_constant () <= 8))
return false;
INTEGRAL_MODE_P() includes vector modes.
So, the proper fix for this issue is to ensure that GET_MODE_CLASS (MODE) ==
MODE_INT is fulfilled.

I adjusted the patch and provided a v2:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658130.html

[Bug target/116035] [14/15] RISC-V: -march=rv64g_xtheadmemidx_zba generates illegal lwu insn

2024-07-24 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116035

Christoph Müllner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Christoph Müllner  ---
The patch got accepted and has been merged:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=9817d29cd66762893782a52b2c304c5083bc0023

A GCC 14 backport was accepted as well and has been merged:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ab0386679fef35c544d139270436c63026e00ff2

Thanks again for reporting!

[Bug target/116131] [14/15 Regression] RISC-V: Unrecognizable insn with xtheadmemidx on rv32

2024-07-30 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116131

--- Comment #2 from Christoph Müllner  ---
Thank you for reporting!

A first analysis showed, that adding more extensions does not change anything.
E.g. rv32gc_xtheadmemidx also triggers the error. However, rv64i_xtheadmemidx
is not affected. Also, the optimization level has no impact on the issue.

(In reply to Jeffrey A. Law from comment #1)
> Looks like non-canonical RTL to me.  Inside a MEM that shift should have
> been turned into a multiply.

I agree, but I don't think it is part of the problem, because in
th_memidx_classify_address_index() we expect ASHIFTs.

When looking at the XTheadMemIdx implementation, we have two relevant files:
* gcc/config/riscv/thead.md has the optimization pattern (th_memidx_*)
* gcc/config/riscv/thead.cc processes them (th_memidx_classify_address_index())

In the particular case, th_memidx_I_c creates the optimized INSN:

(insn 18 14 0 2 (set (mem:SI (plus:SI (reg/f:SI 141)
(ashift:SI (subreg:SI (reg:DI 134 [ a.0_1 ]) 0)
(const_int 2 [0x2]))) [0  S4 A32])
(reg:SI 143 [ b ])) "":4:17 -1
 (nil))

The goal is obviously to generate an th.srw instruction.
The issue here is the subreg, which comes from the following INSN:

(insn 9 7 10 2 (set (reg:SI 139)
(ashift:SI (subreg:SI (reg:DI 134 [ a.0_1 ]) 0)
(const_int 2 [0x2])))
"gcc/testsuite/gcc.target/riscv/pr116131.c":12:8 294 {*ashlsi3}
 (expr_list:REG_DEAD (reg:DI 134 [ a.0_1 ])
(nil)))

An easy fix is to reject subregs (confirmed to work):

diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index a47fe6f28b8..b95959d6827 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md
@@ -758,6 +758,7 @@ (define_insn_and_split "*th_memidx_I_c"
   (match_operand:X 3 "register_operand" "r")))
 (match_operand:TH_M_ANYI 0 "register_operand" "r"))]
   "TARGET_XTHEADMEMIDX
+   && !SUBREG_P (operands[1])
&& CONST_INT_P (operands[2])
&& pow2p_hwi (INTVAL (operands[2]))
&& IN_RANGE (exact_log2 (INTVAL (operands[2])), 1, 3)"

A better alternative is to allow this subreg.
I've prepared a patch and will send it once the test are done.

[Bug target/116131] [14/15 Regression] RISC-V: Unrecognizable insn with xtheadmemidx on rv32

2024-07-30 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116131

Christoph Müllner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2024-07-30
   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org

[Bug target/116590] unrecognized opcode th.vmv8r.v th.vfrec7.v when compiling for risc-v xtheadvector

2024-10-17 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116590

Christoph Müllner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |cooper.qu at linux dot 
alibaba.com
 Ever confirmed|0   |1
   Last reconfirmed|2024-09-04 00:00:00 |2024-10-17

[Bug target/116591] internal compiler error: in extract_insn when compiling for risc-v xtheadvector

2024-10-17 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116591

Christoph Müllner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Keywords||ice-on-valid-code
   Last reconfirmed||2024-10-17
   Assignee|unassigned at gcc dot gnu.org  |cooper.qu at linux dot 
alibaba.com
 Ever confirmed|0   |1

[Bug target/116593] internal compiler error: in get_attr_type, at config/riscv/riscv.md:28048 with -O2 -O3 when compiling for risc-v xtheadvector

2024-10-17 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116593

Christoph Müllner  changed:

   What|Removed |Added

 Target|Riscv   |riscv
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |cooper.qu at linux dot 
alibaba.com
   Last reconfirmed||2024-10-17

[Bug target/116347] [13/14/15 only] RISC-V: Duplicate entries for -mtune in --target-help

2024-10-17 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116347

Christoph Müllner  changed:

   What|Removed |Added

   Last reconfirmed||2024-10-17
 CC||cmuellner at gcc dot gnu.org

--- Comment #1 from Christoph Müllner  ---
Currently, the string "thead-c906" is used as an identifier for a CPU
(RISCV_CORE) and a tuning (RISCV_TUNE). The help message lists under "valid
arguments for -mtune= option" all tuning identifiers followed by all CPU
identifiers.

I don't think changing the identifier is the right thing to do, as it would
break people's build scripts.

Instead, I think it would be better to make the help-string-generator aware of
such cases.

Looking into the code, this should not be too hard:
riscv_get_valid_option_values() in gcc/common/config/riscv/riscv-common.cc
needs to be adjusted (in case OPT_mtune_) to avoid adding duplicates to the
result vector. The function vec_safe_iterate() should help to iterate over
existing entries. The duplication check needs to use strcmp() as the vector
elements are const-char-pointers.

[Bug target/116720] [13/14/15 Regression] RISC-V: Unrecognizable insn with xtheadmemidx on rv32

2024-10-17 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116720

Christoph Müllner  changed:

   What|Removed |Added

   Last reconfirmed||2024-10-17
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED

[Bug target/111565] ICE: in riscv_expand_strcmp_scalar, at config/riscv/riscv-string.cc:382 with -mcpu=thead-c906 -minline-strncmp

2024-10-17 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111565

Christoph Müllner  changed:

   What|Removed |Added

 CC||cmuellner at gcc dot gnu.org
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org
   Last reconfirmed||2024-10-17
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Christoph Müllner  ---
GCC 14 (I just reproduced on GCC 14.1 and 14.2) is affected. Master (GCC 15)
does not have any issues.

[Bug target/116347] [13/14/15 only] RISC-V: Duplicate entries for -mtune in --target-help

2024-10-22 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116347

Christoph Müllner  changed:

   What|Removed |Added

   Last reconfirmed|2024-10-17 00:00:00 |2024-10-22
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #2 from Christoph Müllner  ---
I just noticed that there exists a proposal to address this on the list from
mid August:
 
https://patchwork.sourceware.org/project/gcc/patch/20240819081442.1955204-1-shiyul...@iscas.ac.cn/

This patch adds the postfix "-series" to tuning identifiers, which are already
used as CPU identifiers (e.g. "thead-c906" -> "thead-c906-series"). Jeff
questioned if CPU core identifiers should be listed (and accepted) as strings
for -mtune. Palmer wrote that he would have a look.

Here's a quick overview of what other backends to with mcpu/mtune:
* aarch64|arm|rs6000/PowerPC: mtune and mcpu flags accept the same identifiers.
mtune selects the tuning struct. mcpu additionally sets the enabled extensions
(similar to march).
* riscv: same as above, but additional identifiers for tuning structs exist
that are accepted for mtune.
* mips: No -mcpu flag. mtune selects the tuning struct.
* x86: mcpu is deprecated and behaves like mtune. mtune sets the tuning struct.
march selects extensions and tuning and accepts the same identifiers as mtune.

I still think that simply suppressing duplicates when generating the help text
would be the solution with the least user impact.

[Bug c/109393] Very trivial address calculation does not fold

2024-09-25 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109393

Christoph Müllner  changed:

   What|Removed |Added

 CC||cmuellner at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #11 from Christoph Müllner  ---
Patch has been fixed upstream.

[Bug tree-optimization/114326] Missed optimization for A || B when !B implies A.

2024-09-25 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114326

Christoph Müllner  changed:

   What|Removed |Added

 CC||cmuellner at gcc dot gnu.org
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Christoph Müllner  ---
Fixed on master.

[Bug tree-optimization/117830] [15 Regression] Miscompilation of 464.h264ref at -O2 -march=generic since r15-5563-g1c4d39ada33d36

2024-12-05 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117830

--- Comment #5 from Christoph Müllner  ---
Thank you for reporting this!

I can reproduce this issue on x86_64 (I did not test on other architectures).
I have also confirmed that the suspected change (1c4d39ada33d) causes this
by validating that reverting the change fixes the miscompare.

An initial analysis showed that we have a total of four blends in CPU2006's
h264:
* 3x build_base_gcc43-64bit./block.c.213t.forwprop4
* 1x build_base_gcc43-64bit./macroblock.c.213t.forwprop4

Looking closer at the dump files of forwprop, the issue becomes apparent:
In find_sad_16x16 (macroblock.c), we merge two sequences that both utilize
three of four lanes.

  _230 = VEC_PERM_EXPR ;
  _238 = VEC_PERM_EXPR ;
  vect__108.3193_321 = _238 - _230;
  vect__107.3192_225 = _230 + _238;
  _317 = VEC_PERM_EXPR ;
  // { 0, 5, 2, 7 } could be narrowed to { 0, 5, 0, 4 }

  _263 = VEC_PERM_EXPR ;
  _294 = VEC_PERM_EXPR ;
  vect__109.3191_252 = _294 - _263;
  vect__104.3190_257 = _263 + _294;
  _247 = VEC_PERM_EXPR ;
  // { 0, 5, 2, 7 } could be narrowed to { 0, 5, 0, 4 }

This means the check if we utilize less than half of the lanes in a sequence is
wrong.
Looking into the code shows that this is indeed the case.
I already have a fix that is currently being tested.

[Bug tree-optimization/117830] [15 Regression] Miscompilation of 464.h264ref at -O2 -march=generic since r15-5563-g1c4d39ada33d36

2024-12-19 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117830

--- Comment #6 from Christoph Müllner  ---
Patch on list:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672065.html

[Bug target/116347] [13/14/15 only] RISC-V: Duplicate entries for -mtune in --target-help

2024-12-19 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116347

Christoph Müllner  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #3 from Christoph Müllner  ---
Patch on list:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672062.html

[Bug tree-optimization/117830] [15 Regression] Miscompilation of 464.h264ref at -O2 -march=generic since r15-5563-g1c4d39ada33d36

2024-12-20 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117830

Christoph Müllner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Christoph Müllner  ---
The patch was approved and has been pushed to master:
 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=eee2891312a9b42acabcc82739604c9fa8421757

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-12-20 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 117830, which changed state.

Bug 117830 Summary: [15 Regression] Miscompilation of 464.h264ref at -O2 
-march=generic since r15-5563-g1c4d39ada33d36
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117830

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/118149] [15 regression] ICE when building lsp-plugins-1.2.14 (mmap: Cannot allocate memory in forwprop) since r15-5563

2024-12-20 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118149

Christoph Müllner  changed:

   What|Removed |Added

   Last reconfirmed||2024-12-20
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

--- Comment #8 from Christoph Müllner  ---
Thanks for reporting!

I've analyzed this, and indeed, this got fixed with the recent fix for
PR117830.

When calculating the lane allocation for the blended sequence, we did:
  while (lane_assignment[l] != 0)
l++;
That got fixed so that we won't access out of bounds.

I've sent a patch that adds the reduced testcase to the test suite.

[Bug target/116347] [13/14/15 only] RISC-V: Duplicate entries for -mtune in --target-help

2024-12-20 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116347

Christoph Müllner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Christoph Müllner  ---
Patch was accepted and has been pushed on master:
 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8af296c290216e03bc20e7291e64c19e0d94cfd6

[Bug tree-optimization/118149] [15 regression] ICE when building lsp-plugins-1.2.14 (mmap: Cannot allocate memory in forwprop) since r15-5563

2024-12-20 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118149

Christoph Müllner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Christoph Müllner  ---
The new tests pushed on master:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672123.html

[Bug other/117728] [15 regression] new test case gcc.dg/tree-ssa/satd-hadamard.c from r15-5563-g1c4d39ada33d36 fails

2024-11-21 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117728

Christoph Müllner  changed:

   What|Removed |Added

   Last reconfirmed||2024-11-21
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |cmuellner at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Christoph Müllner  ---
Thanks for reporting!

This is an issue with the test case and not with the code generation.
The test expects the transformation to succeed, but there are valid reasons
that this fails on some platforms.

Similar fails could happen to the other tests (vector-8.c and vector-9.c).

I'll send out a patch that limits the target architecture to aarch64 and
x86-64, where we know the tests pass.

[Bug testsuite/117728] [15 regression] new test case gcc.dg/tree-ssa/satd-hadamard.c from r15-5563-g1c4d39ada33d36 fails

2024-11-21 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117728

Christoph Müllner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #2 from Christoph Müllner  ---
Should be fixed with
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ae0d842f3e7a119b21a000824b10920614088684

[Bug rtl-optimization/117922] [15 Regression] 1000% compilation time slow down on the testcase from pr26854

2025-01-29 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117922

--- Comment #17 from Christoph Müllner  ---
I reproduced the slow-down with a recent master on a 5950X:
* no-mem-fold-offset: 4m58.226s
* mem-fold-offset: 11m19.311s (+127%)

More details from -ftime-report:
* no-mem-fold-offset: df reaching defs   :   9.34 (  3%) 0  (  0%)
* mem-fold-offset: df reaching defs  : 381.40 ( 55%) 0  (  0%)

A look at the detailed time report (-ftime-report -ftime-report-details) shows:

Time variable  wall   GGC
[...]
 phase opt and generate : 682.81 ( 99%)  6175M ( 97%)
 [...]
 callgraph functions expansion  : 646.99 ( 94%)  5695M ( 89%)
[...]
 fold mem offsets   :   1.73 (  0%)   679k (  0%)
 `- CFG verifier:   2.10 (  0%) 0  (  0%)
 `- df use-def / def-use chains :   2.32 (  0%) 0  (  0%)
 `- df reaching defs: 370.68 ( 54%) 0  (  0%)
 `- verify RTL sharing  :   0.05 (  0%) 0  (  0%)
[...]
 TOTAL  : 690.06 6365M

I read this as "fold mem offset utilizes 0% of memory", so there is no issue
with the memory footprint.

To confirm this, `time -v` was used:
* no-mem-fold-offset: Maximum resident set size (kbytes): 15563684
* mem-fold-offset: Maximum resident set size (kbytes): 15564364

I looked at the pass, and a few things could be cleaned up in the pass itself
(e.g., redundant calls). However, that won't change anything in the observed
performance.
The time-consuming part is UD+DU DF analysis for the whole function.
Even if the pass would "return 0" right after doing nothing but the analysis,
we end up with the same run time (confirmed by measurement).

The pass operates on BB-granularity, so DF analysis of the whole function
provides more information than needed. When going through the documentation, I
came across df_set_blocks(), which I expected to reduce the problem
significantly.
So, I moved the df_analyse() call into the FOR_ALL_BB_FN() loop, right after a
call to df_set_blocks(), with the intent to only have a single block set per
iteration.
However, that triggered a few ICEs in DF, and once they were bypassed, ended up
in practical non-termination (i.e. the calls to df_analyse() won't get
significantly cheaper by df_set_blocks()).

My conclusion:
This can only be fixed by not using DF analysis and implementing a
pass-specific analysis.

So far, I have not found a good solution for this. But I haven't looked at all
the suggestions in detail. Can someone help me find what Paolo referenced as
"the multiple definitions DF problem that was introduced for fwprop in 2009"?

[Bug tree-optimization/117079] [15 Regression] FAIL: gcc.target/i386/pr105493.c since r15-2820-gab18785840d7b8

2025-01-14 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117079

--- Comment #4 from Christoph Müllner  ---
The reason that we don't have "MEM " in the dump
anymore is that we now have "MEM ".

Further, the size of the function in the test case shrinks from 225
instructions down to 109 (almost all vector instructions).

I tried to measure a performance difference on my 5950X (-march=native) when
calling the test function four times in a loop with 1024l * 1024 * 1024 * 1024
iterations.
However, I did not see enough evidence to claim that the new code is better
(memory bandwidth is probably the limit):

* old: 4m34.405s, 4m47.825s, 4m38.187s
* new: 4m34.722s, 4m34.936s, 4m34.922s

I propose to fix the failing test case by fixing the test condition.
A patch for that is on the list:
  https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673551.html

FWIW, here is a small code change that will bring back the old behavior for
analysis:

--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2595,7 +2595,7 @@ out:
   auto_vec two_op_perm_indices[2];
   vec two_op_scalar_stmts[2] = {vNULL, vNULL};

-  if (two_operators && oprnds_info.length () == 2 && group_size > 2)
+  if (false && two_operators && oprnds_info.length () == 2 && group_size > 2)
 {
   unsigned idx = 0;
   hash_map seen;

[Bug tree-optimization/118487] [15 Regression] ICE tree check: expected vector_cst, have ssa_name in vector_cst_encoded_nelts, at tree.h:4683 since r15-5563-g1c4d39ada33d36

2025-01-15 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118487

Christoph Müllner  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #2 from Christoph Müllner  ---
I can reproduce this ICE.

The issue comes from the uninitialized mask (or selector) of the vector shuffle
(or permutation).
Uninitialized means that values might exceed the number of possible elements.

The documentation states, "The elements of mask are considered modulo N in the
single-operand case and modulo 2*N in the two-operand case."

However, we don't perform this modulo-operation on the indices found in the
mask.
I will fix this.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2025-01-15 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 117079, which changed state.

Bug 117079 Summary: [15 Regression] FAIL: gcc.target/i386/pr105493.c since 
r15-2820-gab18785840d7b8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117079

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/117079] [15 Regression] FAIL: gcc.target/i386/pr105493.c since r15-2820-gab18785840d7b8

2025-01-15 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117079

Christoph Müllner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Christoph Müllner  ---
Fixed in
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=120a37008222bf6fe17658af3d1ba1b384642905

[Bug tree-optimization/118487] [15 Regression] ICE tree check: expected vector_cst, have ssa_name in vector_cst_encoded_nelts, at tree.h:4683 since r15-5563-g1c4d39ada33d36

2025-01-16 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118487

Christoph Müllner  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Christoph Müllner  ---
Fixed with
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b42eeef63a7e88f90e6ecab9c541b96146759b8c

Thanks for reporting!

[Bug tree-optimization/118487] [15 Regression] ICE tree check: expected vector_cst, have ssa_name in vector_cst_encoded_nelts, at tree.h:4683 since r15-5563-g1c4d39ada33d36

2025-01-15 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118487

--- Comment #3 from Christoph Müllner  ---
My initial comment about the need to sanitize the mask elements of
VEC_PERM_EXPR was correct, but there is nothing to be done for that, because
this is handled by ccp1.

The ICE reported here comes from the issue of not checking the TREE_CODE of the
mask tree.
I've sent a fix for that to the list:
  https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673703.html

While analysing this, I noticed that we make redundant calls to to_constant(),
which is addressed here:
  https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673702.html

[Bug target/119587] RISC-V: XTheadMemIdx: ICE on valid code with asm operands

2025-04-02 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119587

Christoph Müllner  changed:

   What|Removed |Added

 CC||cmuellner at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Keywords||ice-on-valid-code
   Last reconfirmed||2025-04-02

[Bug target/119587] New: RISC-V: XTheadMemIdx: ICE on valid code with asm operands

2025-04-02 Thread cmuellner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119587

Bug ID: 119587
   Summary: RISC-V: XTheadMemIdx: ICE on valid code with asm
operands
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cmuellner at gcc dot gnu.org
  Target Milestone: ---

Bohan Lei reported an ICE in a patch [1] to fix this ICE.

Reproducer:
// gcc -Ofast -march=rv64gc_xtheadmemidx
int a;
int **b;
int**
c ()
{
  int **e = &b[(unsigned)(long)&a];
  __asm__ ("" : "+A"(*e));
  return 0;
}

Replacing "return 0" with "return e" avoids the ICE.

The underlying issue is that the combiner's output cannot
be lowered later on (which triggers the ICE in LRA).
Bohan's patch attempts to address the ICE with a splitter.
However, it has not yet been decided whether that's the right way.
Jeff Law has started a discussion in [2].

Since this has already become more than a simple fix, I'm opening a ticket
here.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2025-March/678933.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2025-April/679950.html