On 10/17/2017 07:22 PM, Jan Hubicka wrote:
According to Agner's tables, gathers range from 12 ops (vgatherdpd) to 66 ops (vpgatherdd). I assume that CPU needs to do following:
In our code, it is basically don't" care" how much work it is for a gather instruction to do its work.
Without gather the most expensive loop in our code couldn't be vectorized (there are only a handful of gather instructions in that loop and dozens of other vector instructions).
Kind regards, -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news