https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119100

--- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> ---
It's even more complicated than that.  You have to consider that there can be a
cost to move data across the units.  ie, it may actually be cheaper to use the
variant that broadcasts the value across a vector (vv form) rather than using a
value from the scalar int/fp register file (vf/vi forms).  It really depends on
the uarch behavior.

Profitability may also depend on how many other similar cases are nearby.  At
least in our uarch we have the concept of a "scalar source buffer" where these
values are queued up speculatively from the scalar units into a limited sized
buffer for consumption on the vector units.  If you don't fill up that buffer,
then the vf/vi forms are likely profitable, but if you fill up the buffer, then
you're going to stall various things waiting for that buffer to drain and make
entries available.

My general sense is that we probably want to default towards the vf/vi forms,
but I don't have emperical data to back that up yet.

Paul -- have you run your patch on any design?  And if so what did you run and
what was the performance delta before/after?

Reply via email to