https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119100
--- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> --- It's even more complicated than that. You have to consider that there can be a cost to move data across the units. ie, it may actually be cheaper to use the variant that broadcasts the value across a vector (vv form) rather than using a value from the scalar int/fp register file (vf/vi forms). It really depends on the uarch behavior. Profitability may also depend on how many other similar cases are nearby. At least in our uarch we have the concept of a "scalar source buffer" where these values are queued up speculatively from the scalar units into a limited sized buffer for consumption on the vector units. If you don't fill up that buffer, then the vf/vi forms are likely profitable, but if you fill up the buffer, then you're going to stall various things waiting for that buffer to drain and make entries available. My general sense is that we probably want to default towards the vf/vi forms, but I don't have emperical data to back that up yet. Paul -- have you run your patch on any design? And if so what did you run and what was the performance delta before/after?