On 10/24/2016 04:25 AM, Paolo Bonzini wrote:
>> >          for (; p + 8 <= e; p += 8) {
>> > -            __builtin_prefetch(p + 8, 0, 0);
>> > +            __builtin_prefetch(p +
>> > +               (8 * cache_line_factor * prefetch_line_dist), 0, 0);
> You should precompute cache_line_bytes * prefetch_line_dist /
> sizeof(uint64_t) in a single variable, prefetch_distance.  This saves
> the effort of loading global variables repeatedly.  Then you can do
> 
>     __builtin_prefetch(p + prefetch_distance, 0, 0);
> 

Let's not complicate things by dividing by sizeof(uint64_t).
It's less complicated to avoid both that and the implied multiply.

  __builtin_prefetch((char *)p + prefetch_distance, 0, 0)


r~

Reply via email to