https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
Richard Biener changed:
What|Removed |Added
Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #10 from Richard Biener ---
So this is now fixed if you use --param vect-partial-vector-usage=2, there is
at the moment no way to get masking/not masking costed against each other. In
theory vect_analyze_loop_costing and vect_estima
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
Richard Biener changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #9 from rguenther at suse dot de ---
On Tue, 13 Jun 2023, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
>
> --- Comment #8 from Hongtao.liu ---
>
> > Can x86 do this? We'd want to apply t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #8 from Hongtao.liu ---
> Can x86 do this? We'd want to apply this to a scalar, so move ivtmp
> to xmm, apply pack_usat or as you say below, the non-existing us_trunc
> and then broadcast.
I see, we don't have scalar version. Also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #7 from rguenther at suse dot de ---
On Mon, 12 Jun 2023, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
>
> --- Comment #6 from Hongtao.liu ---
>
> > and the key thing to optimize is
> >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #6 from Hongtao.liu ---
> and the key thing to optimize is
>
> ivtmp_78 = ivtmp_77 + 4294967232; // -64
> _79 = MIN_EXPR ;
> _80 = (unsigned char) _79;
> _81 = {_80, _80, _80, _80, _80, _80, _80, _80, _80, _80, _80, _80, _8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #5 from Richard Biener ---
Btw, for the case we can use the same mask compare type as we use as type for
the IV (so we know we can represent all required values) we can elide the
saturation. So for example
void foo (double * __rest
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #4 from Richard Biener ---
Adding fully masked AVX512 and AVX512 with a masked epilog data:
size scalar 128 256 512512e512f
19.42 11.329.35 11.17 15.13 16.89
25.726.536.66
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #3 from Richard Biener ---
the naiive "bad" code-gen produces
size 512-masked
212.19
4 6.09
6 4.06
8 3.04
12 2.03
14 1.52
16 1.21
20 1.01
24 0.87
32 0.76
34 0.71
38 0.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
--- Comment #2 from Richard Biener ---
The naiive masked epilogue (--param vect-partial-vector-usage=1 and support
for whilesiult as in a prototype I have) then looks like
leal-1(%rdx), %eax
cmpl$62, %eax
jbe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
Richard Biener changed:
What|Removed |Added
Blocks||53947
Last reconfirmed|
12 matches
Mail list logo