[Bug c++/113719] [13/14/15 regression] g++.target/i386/pr103696.C FAILs

2024-05-14 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

--- Comment #4 from Hongyu Wang  ---
Created attachment 58211
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58211&action=edit
A patch

Hi Rainer,

Could you try the attachment and see if the error was solved? I tested with
cross-compiled solaris gcc but it has some error on varasm with 64bit so I'm
not sure it can pass all 32/64bit test.

[Bug target/113719] [13/14/15 regression] g++.target/i386/pr103696.C FAILs

2024-05-15 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

Hongyu Wang  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-05-16
 Ever confirmed|0   |1
  Component|c++ |target

[Bug target/113719] [13/14/15 regression] g++.target/i386/pr103696.C FAILs

2024-05-30 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

Hongyu Wang  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Hongyu Wang  ---
Fixed on GCC13/14/15.

[Bug target/115341] [15 regression] gcc.target/i386/apx-ndd-2.c etc. FAIL

2024-06-06 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115341

Hongyu Wang  changed:

   What|Removed |Added

 CC||hongyuw at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Hongyu Wang  ---
Fixed on trunk so far.

[Bug target/115370] New: [15 regression] gcc.target/i386/pr77881.c FAIL

2024-06-06 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370

Bug ID: 115370
   Summary: [15 regression] gcc.target/i386/pr77881.c FAIL
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hongyuw at gcc dot gnu.org
  Target Milestone: ---

After x86 ccmp supported with r15-1060-g0b6cea8783b9e1, there is a new fail

FAIL: gcc.target/i386/pr77881.c scan-assembler js[ \t].?L

The codegen changed from 

testq   %rdi, %rdi
js  .L4   
testl   %edx, %edx
jne .L4   
ret   

to

shrq$63, %rdi 
testl   %edx, %edx
setne   %al   
orb %dil, %al 
jne .L11  
ret

[Bug target/115370] [15 regression] gcc.target/i386/pr77881.c FAIL

2024-06-06 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370

Hongyu Wang  changed:

   What|Removed |Added

 Target||x86_64-*-*, i?86-*-*

--- Comment #1 from Hongyu Wang  ---
The issue was in cfgexpand.cc:2648

/* If jumps are cheap and the target does not support conditional
   compare, turn some more codes into jumpy sequences.  */   
else if (BRANCH_COST (optimize_insn_for_speed_p (), false) < 4   
 && targetm.gen_ccmp_first == NULL)  
  {  

Now in x86 we defined targetm.gen_ccmp_first, but it doesn't mean ccmp is
enabled by default as it requires -mapxf.

Guess we need a new target hook have_ccmp.

[Bug target/112943] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1176 with -O2 -march=westmere -mapxf

2023-12-11 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112943

Hongyu Wang  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Hongyu Wang  ---
Fixed on trunk.

[Bug target/112943] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1176 with -O2 -march=westmere -mapxf

2023-12-18 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112943

--- Comment #6 from Hongyu Wang  ---
(In reply to Hongtao Liu from comment #3)
> (In reply to Jakub Jelinek from comment #1)
> > Why does ix86_expand_binary_operator have the use_ndd argument at all? 
> > Shouldn't it always act as if the argument is TARGET_APX_NDD?
> > Or, any particular reason why it isn't done in ashl3 (but in other
> > shifts/rotates)?
> By the time we support apx_ndd, the use_ndd is introduced to enable ndd
> pattern by pattern so that avoid other patterns crash, and now that we've
> completed the ndd patch, I think we can try to remove it. We need to make
> sure that there is no pattern under TARGET_APX_NDD but force a call to
> ix86_expand_binary_operator with use_ndd as false.

The ix86_expand_binary_operator and other binary fixup stuffs are not only
applied to legacy insns, they are also be used in sse/mmx patterns. If we drop
the parameter we need to maintain those vector patterns that could potential
calls the fixup functions at post-reload stage. So from design perspective, it
is better to just involve insns related to NDD, do not mess up with vector
insns.

[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)

2024-02-02 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711

--- Comment #4 from Hongyu Wang  ---
Previously I added 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d564198f960a2f5994dde3f6b83d7a62021e49c3

to prohibit several *POFF constant usage in NDD add alternative. If checking
ADDR_SPACE_GENERIC can avoid the seg prefix usage, we can drop that change?

And I'd suggest to use j prefix for all APX related constraints like jf.

[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)

2024-02-02 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711

--- Comment #6 from Hongyu Wang  ---
(In reply to H.J. Lu from comment #5)
> (In reply to Hongyu Wang from comment #4)
> > Previously I added 
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;
> > h=d564198f960a2f5994dde3f6b83d7a62021e49c3
> > 
> > to prohibit several *POFF constant usage in NDD add alternative. If checking
> > ADDR_SPACE_GENERIC can avoid the seg prefix usage, we can drop that change?
> 
> Are there are any testcases for this change?
> 

Cut and edit from gcc.dg\torture\tls\tls-test.c

#include 
__thread int a = 255; 
__thread int *b;
int *volatile a_in_other_thread = (int *) 12345;

void *
thread_func (void *arg)
{
  a_in_other_thread = &a; //Previously it will try to generate addq $a@tpoff,
%fs:0, %rax 
  a+=11144; //this was not fixed on trunk as UNSPEC_TPOFF is in mem operand
  *((int *) arg) = a;

  return (void *)0;
}

[Bug target/113751] -mapxf -mfma4 generates wrong assembly code

2024-02-03 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113751

Hongyu Wang  changed:

   What|Removed |Added

 CC||hongyuw at gcc dot gnu.org

--- Comment #1 from Hongyu Wang  ---
We haven't disable AMD ISAs like XOP/FMA4 as it will not invoke with APX. 

Quoted from Richard's comment

> We haven’t disabled EGPR for 3DNOW/XOP/LWP/FMA4/TBM instructions, as they will
> be co-operated with -mapxf. We can disable EGPR for them if AMD guys requires.

I think most of these are retired by now, so it's unlikely an
implementation providing
these and also APX will appear.

[Bug target/115463] [15 regression] 526.blender_r regressed 5% on Zen2 with -Ofast -flto -march=native since r15-1058-gc989e59fc99d99

2024-06-13 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115463

--- Comment #3 from Hongyu Wang  ---
Should be fixed, but will wait for the confirm of SPEC result on znver/skylake.

[Bug target/115463] [15 regression] 526.blender_r regressed 5% on Zen2 with -Ofast -flto -march=native since r15-1058-gc989e59fc99d99

2024-06-13 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115463
Bug 115463 depends on bug 115370, which changed state.

Bug 115370 Summary: [15 regression] gcc.target/i386/pr77881.c FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/115370] [15 regression] gcc.target/i386/pr77881.c FAIL

2024-06-13 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370

Hongyu Wang  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Hongyu Wang  ---
Fixed on GCC15.

[Bug tree-optimization/115256] [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7

2024-06-17 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256

Hongyu Wang  changed:

   What|Removed |Added

 CC||hongyuw at gcc dot gnu.org

--- Comment #4 from Hongyu Wang  ---
Part of the dump for create_preheaders before DSE

--
 [local count: 29277718]:
# .MEM_153 = PHI <.MEM_161(4), .MEM_161(3)>
# _15 = PHI <_17(4), 0(3)> 
# _120 = PHI <_19(4), 0(3)>
if (_120 == 0) 
  goto ; [45.64%]   
else   
  goto ; [54.36%]

 [local count: 27536775]:
# _66 = PHI <_15(6), _17(5)>   
# .MEM_125 = PHI <.MEM_153(6), .MEM_164(5)>
_87 = (long unsigned int) _66; 
_88 = _87 * 4; 
_89 = _88 + 8; 
_110 = _89;
# .MEM_167 = VDEF <.MEM_125>   
newmem_111 = malloc (_110);
if (newmem_111 == 0B)  
  goto ; [0.04%] 
else   
  goto ; [99.96%]

 [local count: 11015]:   
# .MEM_168 = VDEF <.MEM_167>   
xmalloc_failed (_110); 

 [local count: 27536775]:
# .MEM_154 = PHI <.MEM_167(7), .MEM_168(8)>
# .MEM_170 = VDEF <.MEM_154>   
MEM[(struct vec_prefix *)newmem_111].alloc = _66;  
# .MEM_171 = VDEF <.MEM_170>   
MEM[(struct vec_prefix *)newmem_111].num = 0;  

 [local count: 39298950]:   
# _91 = PHI <0B(6), newmem_111(9)> 
# .MEM_152 = PHI <.MEM_153(6), .MEM_171(9)>
# .MEM_174 = VDEF <.MEM_152>   
li.to_visit = _91; 
# VUSE <.MEM_174>  
_61 = cfun;
# VUSE <.MEM_174>  
_62 = _61->x_current_loops;
# VUSE <.MEM_174>  
_63 = _62->tree_root;  

 [local count: 77159561]:   
# aloop_80 = PHI <_63(10), _108(26)>   
# .MEM_147 = PHI <.MEM_174(10), .MEM_90(26)>   

 [local count: 701450557]:  
# aloop_64 = PHI
# .MEM_148 = PHI <.MEM_147(11), .MEM_149(16)>  
# VUSE <.MEM_148>  
_65 = aloop_64->num;   
if (_65 > 0)   
  goto ; [50.00%]   
else   
  goto ; [50.00%]   

 [local count: 350725279]:  
if (_91 != 0B) 
  goto ; [70.00%]   
else   
  goto ; [30.00%]   

 [local count: 245507696]:  
_67 = &MEM[(struct VEC_int_heap *)_91].base;   

 [local count: 350725279]:  
# _68 = PHI <0B(13), _67(14)>  
# VUSE <.MEM_148>  
_69 = _68->num;
_70 = _69 + 1; 
# .MEM_179 = VDEF <.MEM_148>   
_68->num = _70;
# .MEM_180 = VDEF <.MEM_179>   
MEM  [(int *)_68].vec[_69] = _65; 
--

The problem is, for the malloced stores, 

MEM[(struct vec_prefix *)newmem_111].alloc = _66;
MEM[(struct vec_prefix *)newmem_111].num = 0;

These 2 stmts are marked as dead store and eliminated, but actually there was a
use chain

--
 [local count: 39298950]:   
# _91 = PHI <0B(6), newmem_111(9)>
# .MEM_152 = PHI <.MEM_153(6), .MEM_171(9)>
# .MEM_174 = VDEF <.MEM_152>   
li.to_visit = _91;
...

 [local count: 350725279]:
if (_91 != 0B) 
  goto ; [70.00%]   
else   
  goto ; [30.00%]   

 [local count: 245507696]:  
_67 = &MEM[(struct VEC_int_heap *)_91].base; 

 [local count: 350725279]:  
# _68 = PHI <0B(13), _67(14)>  
# VUSE <.MEM_148>  
_69 = _68->num;
_70 = _69 + 1; 
# .MEM_179 = VDEF <.MEM_148>   
_68->num = _70;
# .MEM_180 = VDEF <.MEM_179>   
MEM  [(int *)_68].vec[_69] = _65;
--

The source code ha

[Bug target/113719] [13/14 regression] g++.target/i386/pr103696.C FAILs

2024-07-15 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719

--- Comment #15 from Hongyu Wang  ---
(In reply to Alexandre Oliva from comment #14)
> Fixed in 15.  Maybe backport the last two patches to earlier branches?

Yes, I've backport the original one down to gcc13 so please do the same.
Thanks!

[Bug tree-optimization/115843] [14/15 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-15 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

Hongyu Wang  changed:

   What|Removed |Added

 CC||hongyuw at gcc dot gnu.org

--- Comment #8 from Hongyu Wang  ---
(In reply to Richard Biener from comment #7)
> Note while on the GCC 14 branch with the fix as posted I see the correct
> 
> movl$-128, %eax
> vpxor   %xmm2, %xmm2, %xmm2
> kxorb   %k4, %k4, %k4
> kmovb   %eax, %k1
> vmovdqu64   KingSafetyMask1-56(%rip), %zmm0{%k1}{z}
> vmovdqu64   KingSafetyMask1-48(%rip), %zmm1{%k1}{z}
> movl$64, %eax
> kmovb   %eax, %k2
> ..
> 
> oddly enough on trunk while there's
> 
> (insn 5 26 76 2 (set (reg:QI 4 si [orig:113 loop_mask_57 ] [113])
> (const_int -128 [0xff80])) "t.c":6:1 91 {*movqi_internal}
>  (expr_list:REG_EQUAL (const_int -128 [0xff80])
> (nil)))
> (insn:TI 76 5 92 2 (set (reg:QI 73 k5 [orig:113 loop_mask_57 ] [113])
> (reg:QI 4 si [orig:113 loop_mask_57 ] [113])) "t.c":6:1 91
> {*movqi_internal}
>  (expr_list:REG_DEAD (reg:QI 4 si [orig:113 loop_mask_57 ] [113])
> (nil)))
> 
> in .dfinish there's
> 
> movl$-128, %esi
> kmovw   %esi, %k5
> 
> in the assembly and we leak extra set bits into %k5.  I have a debug patch
> which then causes the testcase to fail again on trunk but not on the branch.
> How do we end up with kmovw from the above insns?  It looks like
> *movqi_internal might benefit from the new [] syntax - maybe
> alternatives/attributes got mixed up?

movqi_internal will emit kmovw when -mno-avx512dq on kmov alternatives, this
was added in r7-4839-g46e89251c471b2

So I wonder how gcc14 will choose kmovb on just -mavx512vl. The code keeps the
same for this part.

But using kmovw for QImode mask is not correct as we don't know the value in
gpr. Perhaps we'd consider restrict the kmovb under avx512dq only.

[Bug tree-optimization/115843] [14/15 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

2024-07-15 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

--- Comment #11 from Hongyu Wang  ---
(In reply to Hongtao Liu from comment #10)
> > But using kmovw for QImode mask is not correct as we don't know the value in
> > gpr. Perhaps we'd consider restrict the kmovb under avx512dq only.
> 
> Why? as long as we only care about lower 8 bits, vmovw should be fine.

Ah yes, I was wrong. As long as the usage of mask did not touch those extra
bits there's nothing wrong. And suppose the QI->HI conversion will use zext
sematic so we can still get correct value.

[Bug middle-end/116065] [13/14/15 Regression] Function attribute optimize() might make ISA target attribute broken

2024-07-25 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116065

--- Comment #7 from Hongyu Wang  ---
(In reply to Andrew Pinski from comment #6)
> (In reply to Andrew Pinski from comment #5)
> > then if that is the case then aarch64 started with r14-6290-g9f0f7d802482a8
> > (which added OPT_mearly_ra_ to aarch_option_optimization_table).
> > 
> > What happens if you mark -munroll-only-small-loops as Optimization ?
> > if that works, then aarch64 fix is to mark -mearly-ra= as Optimization too.
> 
> Yes this fixes aarch64 testcase:
> ```
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 2f90f10352a..6229bcb371e 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -256,7 +256,7 @@ EnumValue
>  Enum(early_ra_scope) String(none) Value(AARCH64_EARLY_RA_NONE)
> 
>  mearly-ra=
> -Target RejectNegative Joined Enum(early_ra_scope) Var(aarch64_early_ra)
> Init(AARCH64_EARLY_RA_NONE) Save
> +Target RejectNegative Joined Enum(early_ra_scope) Var(aarch64_early_ra)
> Init(AARCH64_EARLY_RA_NONE) Optimization
>  Specify when to enable an early register allocation pass.  The possibilities
>  are: all functions, functions that have access to strided multi-register
>  instructions, and no functions.
> 
> ```
> 
> So yes adding Optimization to -munroll-only-small-loops should fix that too.

Confirmed, append Optimization fixed this issue in x86.
I'm quite confused by how the unmarked Optimization seems resets the flags, the
target attribute was overrided and the error reports like isa flag mismatched.

[Bug target/119539] [15 Regression] FAIL: gcc.target/i386/apx-nf.c scan-assembler-times {nf} rol 4

2025-03-31 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119539

--- Comment #2 from Hongyu Wang  ---
Created attachment 60925
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60925&action=edit
Untested fix

[Bug target/119539] [15 Regression] FAIL: gcc.target/i386/apx-nf.c scan-assembler-times {nf} rol 4

2025-04-02 Thread hongyuw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119539

Hongyu Wang  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from Hongyu Wang  ---
Fixed on trunk so far.