https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80846
--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> --- On September 7, 2017 1:53:47 PM GMT+02:00, "jakub at gcc dot gnu.org" <gcc-bugzi...@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80846 > >--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> --- >(In reply to Richard Biener from comment #11) >> that's not using the unpacking strategy (sum adjacent elements) but >still the >> vector shift approach (add upper/lower halves). That's sth that can >be >> changed independently. >> >> Waiting for final vec_extract/init2 optab settling. > >That should be settled now. > >BTW, for reductions in PR80324 I've added for avx512fintrin.h >__MM512_REDUCE_OP >which for reductions from 512-bit vectors uses smaller and smaller >vectors, >perhaps that is something we should use for the reductions emitted by >the >vectorizer too (perhaps through a target hook that would emit gimple >for the >reduction)? Yeah, I have a patch that does this. The question is how to query the target if the vector sizes share the same register set. Like we wouldn't want to go to mmx register size. Doing this would also allow to execute the adds for 512 to 128 bit reduction in parallel.