https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111874
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- > For the case of conditional (or loop masked) fold-left reductions the scalar > fallback isn't implemented. But AVX512 has vpcompress that could be used > to implement a more efficient sequence for a masked fold-left, possibly > using a loop and population count of the mask. There's extra kmov + vpcompress + popcnt, I'm afraid the performance could be worse than the scalar version.