Just a note: prefetchnta does not serve the same function as prefetchw. prefetchnta prefetches data that the program expects to use *exactly once*, and never again. If this algorithm actually wants that behavior, then you might actually get an improvement by using prefetchnta. However, if the algorithm uses the prefetched data more than once (including by reading data in the same cacheline), then prefetchnta has the wrong semantic, and will decrease performance.
If the algorithm reuses the data, it should use prefetcht0, prefetcht1, or prefetcht2. As with any change to this kind of performance-critical code, you might consider benchmarking. - Josh Triplett
signature.asc
Description: OpenPGP digital signature