https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99785
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Mike Hommey from comment #7) > It's worth noting that the clang variant of the code makes use of > __builtin_shufflevector, which the gcc variant doesn't (per > https://searchfox.org/mozilla-central/source/gfx/wr/swgl/src/vector_type.h), > so the build time comparison might be influenced by that. clang does manage > to inline blend_pixels, though, and the resulting code is much smaller than > what GCC produces. It is not exactly __builtin_shufflevector but rather VectorType in clang uses a type which is just a vector type (using ext_vector_type) while in GCC's version uses a template class for it. Can you disable the ext_vector_type usage in header and see if how slow clang becomes?