[Bug c++/113719] [13/14/15 regression] g++.target/i386/pr103696.C FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719 --- Comment #4 from Hongyu Wang --- Created attachment 58211 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58211&action=edit A patch Hi Rainer, Could you try the attachment and see if the error was solved? I tested with cross-compiled solaris gcc but it has some error on varasm with 64bit so I'm not sure it can pass all 32/64bit test.
[Bug target/113719] [13/14/15 regression] g++.target/i386/pr103696.C FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719 Hongyu Wang changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2024-05-16 Ever confirmed|0 |1 Component|c++ |target
[Bug target/113719] [13/14/15 regression] g++.target/i386/pr103696.C FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719 Hongyu Wang changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Hongyu Wang --- Fixed on GCC13/14/15.
[Bug target/115341] [15 regression] gcc.target/i386/apx-ndd-2.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115341 Hongyu Wang changed: What|Removed |Added CC||hongyuw at gcc dot gnu.org Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Hongyu Wang --- Fixed on trunk so far.
[Bug target/115370] New: [15 regression] gcc.target/i386/pr77881.c FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370 Bug ID: 115370 Summary: [15 regression] gcc.target/i386/pr77881.c FAIL Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hongyuw at gcc dot gnu.org Target Milestone: --- After x86 ccmp supported with r15-1060-g0b6cea8783b9e1, there is a new fail FAIL: gcc.target/i386/pr77881.c scan-assembler js[ \t].?L The codegen changed from testq %rdi, %rdi js .L4 testl %edx, %edx jne .L4 ret to shrq$63, %rdi testl %edx, %edx setne %al orb %dil, %al jne .L11 ret
[Bug target/115370] [15 regression] gcc.target/i386/pr77881.c FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370 Hongyu Wang changed: What|Removed |Added Target||x86_64-*-*, i?86-*-* --- Comment #1 from Hongyu Wang --- The issue was in cfgexpand.cc:2648 /* If jumps are cheap and the target does not support conditional compare, turn some more codes into jumpy sequences. */ else if (BRANCH_COST (optimize_insn_for_speed_p (), false) < 4 && targetm.gen_ccmp_first == NULL) { Now in x86 we defined targetm.gen_ccmp_first, but it doesn't mean ccmp is enabled by default as it requires -mapxf. Guess we need a new target hook have_ccmp.
[Bug target/112943] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1176 with -O2 -march=westmere -mapxf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112943 Hongyu Wang changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from Hongyu Wang --- Fixed on trunk.
[Bug target/112943] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1176 with -O2 -march=westmere -mapxf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112943 --- Comment #6 from Hongyu Wang --- (In reply to Hongtao Liu from comment #3) > (In reply to Jakub Jelinek from comment #1) > > Why does ix86_expand_binary_operator have the use_ndd argument at all? > > Shouldn't it always act as if the argument is TARGET_APX_NDD? > > Or, any particular reason why it isn't done in ashl3 (but in other > > shifts/rotates)? > By the time we support apx_ndd, the use_ndd is introduced to enable ndd > pattern by pattern so that avoid other patterns crash, and now that we've > completed the ndd patch, I think we can try to remove it. We need to make > sure that there is no pattern under TARGET_APX_NDD but force a call to > ix86_expand_binary_operator with use_ndd as false. The ix86_expand_binary_operator and other binary fixup stuffs are not only applied to legacy insns, they are also be used in sse/mmx patterns. If we drop the parameter we need to maintain those vector patterns that could potential calls the fixup functions at post-reload stage. So from design perspective, it is better to just involve insns related to NDD, do not mess up with vector insns.
[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711 --- Comment #4 from Hongyu Wang --- Previously I added https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d564198f960a2f5994dde3f6b83d7a62021e49c3 to prohibit several *POFF constant usage in NDD add alternative. If checking ADDR_SPACE_GENERIC can avoid the seg prefix usage, we can drop that change? And I'd suggest to use j prefix for all APX related constraints like jf.
[Bug target/113711] APX instruction set and instructions longer than 15 bytes (assembly warning)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113711 --- Comment #6 from Hongyu Wang --- (In reply to H.J. Lu from comment #5) > (In reply to Hongyu Wang from comment #4) > > Previously I added > > https://gcc.gnu.org/git/?p=gcc.git;a=commit; > > h=d564198f960a2f5994dde3f6b83d7a62021e49c3 > > > > to prohibit several *POFF constant usage in NDD add alternative. If checking > > ADDR_SPACE_GENERIC can avoid the seg prefix usage, we can drop that change? > > Are there are any testcases for this change? > Cut and edit from gcc.dg\torture\tls\tls-test.c #include __thread int a = 255; __thread int *b; int *volatile a_in_other_thread = (int *) 12345; void * thread_func (void *arg) { a_in_other_thread = &a; //Previously it will try to generate addq $a@tpoff, %fs:0, %rax a+=11144; //this was not fixed on trunk as UNSPEC_TPOFF is in mem operand *((int *) arg) = a; return (void *)0; }
[Bug target/113751] -mapxf -mfma4 generates wrong assembly code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113751 Hongyu Wang changed: What|Removed |Added CC||hongyuw at gcc dot gnu.org --- Comment #1 from Hongyu Wang --- We haven't disable AMD ISAs like XOP/FMA4 as it will not invoke with APX. Quoted from Richard's comment > We haven’t disabled EGPR for 3DNOW/XOP/LWP/FMA4/TBM instructions, as they will > be co-operated with -mapxf. We can disable EGPR for them if AMD guys requires. I think most of these are retired by now, so it's unlikely an implementation providing these and also APX will appear.
[Bug target/115463] [15 regression] 526.blender_r regressed 5% on Zen2 with -Ofast -flto -march=native since r15-1058-gc989e59fc99d99
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115463 --- Comment #3 from Hongyu Wang --- Should be fixed, but will wait for the confirm of SPEC result on znver/skylake.
[Bug target/115463] [15 regression] 526.blender_r regressed 5% on Zen2 with -Ofast -flto -march=native since r15-1058-gc989e59fc99d99
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115463 Bug 115463 depends on bug 115370, which changed state. Bug 115370 Summary: [15 regression] gcc.target/i386/pr77881.c FAIL https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug target/115370] [15 regression] gcc.target/i386/pr77881.c FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370 Hongyu Wang changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #5 from Hongyu Wang --- Fixed on GCC15.
[Bug tree-optimization/115256] [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256 Hongyu Wang changed: What|Removed |Added CC||hongyuw at gcc dot gnu.org --- Comment #4 from Hongyu Wang --- Part of the dump for create_preheaders before DSE -- [local count: 29277718]: # .MEM_153 = PHI <.MEM_161(4), .MEM_161(3)> # _15 = PHI <_17(4), 0(3)> # _120 = PHI <_19(4), 0(3)> if (_120 == 0) goto ; [45.64%] else goto ; [54.36%] [local count: 27536775]: # _66 = PHI <_15(6), _17(5)> # .MEM_125 = PHI <.MEM_153(6), .MEM_164(5)> _87 = (long unsigned int) _66; _88 = _87 * 4; _89 = _88 + 8; _110 = _89; # .MEM_167 = VDEF <.MEM_125> newmem_111 = malloc (_110); if (newmem_111 == 0B) goto ; [0.04%] else goto ; [99.96%] [local count: 11015]: # .MEM_168 = VDEF <.MEM_167> xmalloc_failed (_110); [local count: 27536775]: # .MEM_154 = PHI <.MEM_167(7), .MEM_168(8)> # .MEM_170 = VDEF <.MEM_154> MEM[(struct vec_prefix *)newmem_111].alloc = _66; # .MEM_171 = VDEF <.MEM_170> MEM[(struct vec_prefix *)newmem_111].num = 0; [local count: 39298950]: # _91 = PHI <0B(6), newmem_111(9)> # .MEM_152 = PHI <.MEM_153(6), .MEM_171(9)> # .MEM_174 = VDEF <.MEM_152> li.to_visit = _91; # VUSE <.MEM_174> _61 = cfun; # VUSE <.MEM_174> _62 = _61->x_current_loops; # VUSE <.MEM_174> _63 = _62->tree_root; [local count: 77159561]: # aloop_80 = PHI <_63(10), _108(26)> # .MEM_147 = PHI <.MEM_174(10), .MEM_90(26)> [local count: 701450557]: # aloop_64 = PHI # .MEM_148 = PHI <.MEM_147(11), .MEM_149(16)> # VUSE <.MEM_148> _65 = aloop_64->num; if (_65 > 0) goto ; [50.00%] else goto ; [50.00%] [local count: 350725279]: if (_91 != 0B) goto ; [70.00%] else goto ; [30.00%] [local count: 245507696]: _67 = &MEM[(struct VEC_int_heap *)_91].base; [local count: 350725279]: # _68 = PHI <0B(13), _67(14)> # VUSE <.MEM_148> _69 = _68->num; _70 = _69 + 1; # .MEM_179 = VDEF <.MEM_148> _68->num = _70; # .MEM_180 = VDEF <.MEM_179> MEM [(int *)_68].vec[_69] = _65; -- The problem is, for the malloced stores, MEM[(struct vec_prefix *)newmem_111].alloc = _66; MEM[(struct vec_prefix *)newmem_111].num = 0; These 2 stmts are marked as dead store and eliminated, but actually there was a use chain -- [local count: 39298950]: # _91 = PHI <0B(6), newmem_111(9)> # .MEM_152 = PHI <.MEM_153(6), .MEM_171(9)> # .MEM_174 = VDEF <.MEM_152> li.to_visit = _91; ... [local count: 350725279]: if (_91 != 0B) goto ; [70.00%] else goto ; [30.00%] [local count: 245507696]: _67 = &MEM[(struct VEC_int_heap *)_91].base; [local count: 350725279]: # _68 = PHI <0B(13), _67(14)> # VUSE <.MEM_148> _69 = _68->num; _70 = _69 + 1; # .MEM_179 = VDEF <.MEM_148> _68->num = _70; # .MEM_180 = VDEF <.MEM_179> MEM [(int *)_68].vec[_69] = _65; -- The source code ha
[Bug target/113719] [13/14 regression] g++.target/i386/pr103696.C FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113719 --- Comment #15 from Hongyu Wang --- (In reply to Alexandre Oliva from comment #14) > Fixed in 15. Maybe backport the last two patches to earlier branches? Yes, I've backport the original one down to gcc13 so please do the same. Thanks!
[Bug tree-optimization/115843] [14/15 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843 Hongyu Wang changed: What|Removed |Added CC||hongyuw at gcc dot gnu.org --- Comment #8 from Hongyu Wang --- (In reply to Richard Biener from comment #7) > Note while on the GCC 14 branch with the fix as posted I see the correct > > movl$-128, %eax > vpxor %xmm2, %xmm2, %xmm2 > kxorb %k4, %k4, %k4 > kmovb %eax, %k1 > vmovdqu64 KingSafetyMask1-56(%rip), %zmm0{%k1}{z} > vmovdqu64 KingSafetyMask1-48(%rip), %zmm1{%k1}{z} > movl$64, %eax > kmovb %eax, %k2 > .. > > oddly enough on trunk while there's > > (insn 5 26 76 2 (set (reg:QI 4 si [orig:113 loop_mask_57 ] [113]) > (const_int -128 [0xff80])) "t.c":6:1 91 {*movqi_internal} > (expr_list:REG_EQUAL (const_int -128 [0xff80]) > (nil))) > (insn:TI 76 5 92 2 (set (reg:QI 73 k5 [orig:113 loop_mask_57 ] [113]) > (reg:QI 4 si [orig:113 loop_mask_57 ] [113])) "t.c":6:1 91 > {*movqi_internal} > (expr_list:REG_DEAD (reg:QI 4 si [orig:113 loop_mask_57 ] [113]) > (nil))) > > in .dfinish there's > > movl$-128, %esi > kmovw %esi, %k5 > > in the assembly and we leak extra set bits into %k5. I have a debug patch > which then causes the testcase to fail again on trunk but not on the branch. > How do we end up with kmovw from the above insns? It looks like > *movqi_internal might benefit from the new [] syntax - maybe > alternatives/attributes got mixed up? movqi_internal will emit kmovw when -mno-avx512dq on kmov alternatives, this was added in r7-4839-g46e89251c471b2 So I wonder how gcc14 will choose kmovb on just -mavx512vl. The code keeps the same for this part. But using kmovw for QImode mask is not correct as we don't know the value in gpr. Perhaps we'd consider restrict the kmovb under avx512dq only.
[Bug tree-optimization/115843] [14/15 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843 --- Comment #11 from Hongyu Wang --- (In reply to Hongtao Liu from comment #10) > > But using kmovw for QImode mask is not correct as we don't know the value in > > gpr. Perhaps we'd consider restrict the kmovb under avx512dq only. > > Why? as long as we only care about lower 8 bits, vmovw should be fine. Ah yes, I was wrong. As long as the usage of mask did not touch those extra bits there's nothing wrong. And suppose the QI->HI conversion will use zext sematic so we can still get correct value.
[Bug middle-end/116065] [13/14/15 Regression] Function attribute optimize() might make ISA target attribute broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116065 --- Comment #7 from Hongyu Wang --- (In reply to Andrew Pinski from comment #6) > (In reply to Andrew Pinski from comment #5) > > then if that is the case then aarch64 started with r14-6290-g9f0f7d802482a8 > > (which added OPT_mearly_ra_ to aarch_option_optimization_table). > > > > What happens if you mark -munroll-only-small-loops as Optimization ? > > if that works, then aarch64 fix is to mark -mearly-ra= as Optimization too. > > Yes this fixes aarch64 testcase: > ``` > diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt > index 2f90f10352a..6229bcb371e 100644 > --- a/gcc/config/aarch64/aarch64.opt > +++ b/gcc/config/aarch64/aarch64.opt > @@ -256,7 +256,7 @@ EnumValue > Enum(early_ra_scope) String(none) Value(AARCH64_EARLY_RA_NONE) > > mearly-ra= > -Target RejectNegative Joined Enum(early_ra_scope) Var(aarch64_early_ra) > Init(AARCH64_EARLY_RA_NONE) Save > +Target RejectNegative Joined Enum(early_ra_scope) Var(aarch64_early_ra) > Init(AARCH64_EARLY_RA_NONE) Optimization > Specify when to enable an early register allocation pass. The possibilities > are: all functions, functions that have access to strided multi-register > instructions, and no functions. > > ``` > > So yes adding Optimization to -munroll-only-small-loops should fix that too. Confirmed, append Optimization fixed this issue in x86. I'm quite confused by how the unmarked Optimization seems resets the flags, the target attribute was overrided and the error reports like isa flag mismatched.
[Bug target/119539] [15 Regression] FAIL: gcc.target/i386/apx-nf.c scan-assembler-times {nf} rol 4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119539 --- Comment #2 from Hongyu Wang --- Created attachment 60925 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60925&action=edit Untested fix
[Bug target/119539] [15 Regression] FAIL: gcc.target/i386/apx-nf.c scan-assembler-times {nf} rol 4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119539 Hongyu Wang changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #5 from Hongyu Wang --- Fixed on trunk so far.