https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116312
ktkachov at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #2 from ktkachov at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #1) > >but we could implement it as a simple final assembly output template change > >for minimal invasion. > > No you can't since ldp and ld2 mean 2 different things. > > ld2 is basically a perm to unmix the two registers. that is load lanes. > > Note in the GCC case there is only one fadd while in LLVM there are 2 though > indepedent. > > so the question becomes is the ldp better than ld2 here? overall or just > looking at the ldp vs ld2? Yeah you're right, it's too early in the morning for me... It could be that LLVM's vectorisation approach here is better, but that's a separate discussion