https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> --- Actually vectorization hurts on both compilers and bit more with clang. It seems that all important loops are hand vectorized and since register pressure is a problem, vectorizing other loops causes enough of collateral damage to register allocation to regress performance. I believe the core of the problem (or at least one of them) is simply way we compile loops popping data from std::vector based stack. See PR109849 We keep updating stack datastructure in the innermost loop becuase in not too common case reallocation needs to be done and that is done by offlined code.