https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107199

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
or for better latency compress with full mask, in parallel compute the lane to
extract and then shift/shuffle things to get the lane in zero position.  I also
see compress isn't available for HImode or QImode elements.

To recap, we have operations under a mask and we'd like to extract the
last operation result from a vector.

If we can get a mask with just the last bit set we can use compress.

Reply via email to