https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111874

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
> For the case of conditional (or loop masked) fold-left reductions the scalar
> fallback isn't implemented.  But AVX512 has vpcompress that could be used
> to implement a more efficient sequence for a masked fold-left, possibly
> using a loop and population count of the mask.
There's extra kmov + vpcompress + popcnt, I'm afraid the performance could be 
 worse than the scalar version.

Reply via email to