https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201

--- Comment #10 from Marc Glisse <glisse at gcc dot gnu.org> ---
For AVX512, I wonder if we could use vpsadbw to compute the sums for each
64-bit part, then vcompressb to collect them in the lower 64 bits, then vpsadbw
to conclude. Or whatever other faster variant (is Peter Cordes around?). But
that's not required for this patch.

Reply via email to