[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2023-07-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2022-10-11 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 --- Comment #9 from rguenther at suse dot de --- On Tue, 11 Oct 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 > > --- Comment #8 from Hongtao.liu --- > > > > > One downside for a fully masked b

[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2022-10-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 --- Comment #8 from Hongtao.liu --- > > One downside for a fully masked body is that we're using masked stores > which usually have higher latency due to the "merge" semantics which > means an extra memory input + merge operation. Not sure if

[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2022-10-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 --- Comment #7 from Richard Biener --- (In reply to Hongtao.liu from comment #5) > Also i think masked epilog(--param=vect-partial-vector-usage=1) should be > good for general cases under AVX512, espicially when main loop's vector > width is 512

[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2022-10-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 --- Comment #6 from Richard Biener --- (In reply to Hongtao.liu from comment #4) > change "*k, CBC" to "?k, CBC", in *mov{qi,hi,si,di}_internal. > then RA works good to choose kxnor for setting constm1_rtx to mask register, > and i got below wit

[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2022-10-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 --- Comment #5 from Hongtao.liu --- Also i think masked epilog(--param=vect-partial-vector-usage=1) should be good for general cases under AVX512, espicially when main loop's vector width is 512, and the remain tripcount is not enough for 256-bi

[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2022-10-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 --- Comment #4 from Hongtao.liu --- change "*k, CBC" to "?k, CBC", in *mov{qi,hi,si,di}_internal. then RA works good to choose kxnor for setting constm1_rtx to mask register, and i got below with your attached patch(change #if 0 to #if 1), seems

[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2022-10-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 --- Comment #3 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:498ad738690b3c464f901d63dcd4d0f49a50dd00 commit r13-3218-g498ad738690b3c464f901d63dcd4d0f49a50dd00 Author: liuhongt Date: Mon Oct

[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2022-10-09 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 --- Comment #2 from Hongtao.liu --- For UNSPEC part, we can create a new define_insn with genenral operation and accept both gpr and mask alternatives just like other logic patterns. For gpr version, we can split it to xor + not after reload. Fo

[Bug target/107093] AVX512 mask operations not simplified in fully masked loop

2022-09-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 --- Comment #1 from Richard Biener --- Created attachment 53645 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53645&action=edit prototype for WHILE_ULT I'm playing with the attached. Note it requires the third operand patch for WHILE_UL