[Bug target/107199] AVX512 fully masked loop vectorization needs extract_last pattern for vectorization of live variables

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 24 Jul 2023 01:48:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107199


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
or for better latency compress with full mask, in parallel compute the lane to
extract and then shift/shuffle things to get the lane in zero position.  I also
see compress isn't available for HImode or QImode elements.

To recap, we have operations under a mask and we'd like to extract the
last operation result from a vector.

If we can get a mask with just the last bit set we can use compress.

[Bug target/107199] AVX512 fully masked loop vectorization needs extract_last pattern for vectorization of live variables

Reply via email to