https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65701
--- Comment #15 from Jan Hubicka <hubicka at gcc dot gnu.org> --- It would be nice to test it on AVX enabled intel CPU. There are IMO at least two things - first is the vectorizer oddity, second is that the fastest code seems to happen with large-function-insns=1000. I have no problem with adjusting this argument - it was largery unutuned for years, but I would like to have some idea why the inlining hurts and if we can fix that instead. Sadly the functions are quite big...