https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
Richard Biener changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #9 from rguenther at suse dot de ---
On Tue, 11 Oct 2022, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
>
> --- Comment #8 from Hongtao.liu ---
>
> >
> > One downside for a fully masked b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #8 from Hongtao.liu ---
>
> One downside for a fully masked body is that we're using masked stores
> which usually have higher latency due to the "merge" semantics which
> means an extra memory input + merge operation. Not sure if
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #7 from Richard Biener ---
(In reply to Hongtao.liu from comment #5)
> Also i think masked epilog(--param=vect-partial-vector-usage=1) should be
> good for general cases under AVX512, espicially when main loop's vector
> width is 512
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #6 from Richard Biener ---
(In reply to Hongtao.liu from comment #4)
> change "*k, CBC" to "?k, CBC", in *mov{qi,hi,si,di}_internal.
> then RA works good to choose kxnor for setting constm1_rtx to mask register,
> and i got below wit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #5 from Hongtao.liu ---
Also i think masked epilog(--param=vect-partial-vector-usage=1) should be good
for general cases under AVX512, espicially when main loop's vector width is
512, and the remain tripcount is not enough for 256-bi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #4 from Hongtao.liu ---
change "*k, CBC" to "?k, CBC", in *mov{qi,hi,si,di}_internal.
then RA works good to choose kxnor for setting constm1_rtx to mask register,
and i got below with your attached patch(change #if 0 to #if 1), seems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #3 from CVS Commits ---
The master branch has been updated by hongtao Liu :
https://gcc.gnu.org/g:498ad738690b3c464f901d63dcd4d0f49a50dd00
commit r13-3218-g498ad738690b3c464f901d63dcd4d0f49a50dd00
Author: liuhongt
Date: Mon Oct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #2 from Hongtao.liu ---
For UNSPEC part, we can create a new define_insn with genenral operation and
accept both gpr and mask alternatives just like other logic patterns.
For gpr version, we can split it to xor + not after reload.
Fo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093
--- Comment #1 from Richard Biener ---
Created attachment 53645
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53645&action=edit
prototype for WHILE_ULT
I'm playing with the attached. Note it requires the third operand patch for
WHILE_UL
10 matches
Mail list logo