https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107199
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peter at cordes dot ca --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- or for better latency compress with full mask, in parallel compute the lane to extract and then shift/shuffle things to get the lane in zero position. I also see compress isn't available for HImode or QImode elements. To recap, we have operations under a mask and we'd like to extract the last operation result from a vector. If we can get a mask with just the last bit set we can use compress.