On Tue, Apr 05, 2016 at 08:20:54PM +0900, Mizuki Asakura wrote: > > This code is not just there for prefetching. It is an example of > > using software pipelining: > > OK. I understand. > But the code is very hard to maintain... I've met too many register > conflictions. > # q2 and d2 were used in a same sequence. It cannot be exist in aarch64-neon. > > Anyway, I'll try to remove unnecessary register copies as you've suggested. > After that, I'll also tryh to make benchmarks that > * advance vs none > * L1 / L2 / L3 (Cortex-A53 doesn't have), keep / strm > to find the better configuration. > > But it is only a result of Cortex-A53 (that you ane me have). Does anyone can > test other (expensive :) aarch64 environment ? > (Cortex-Axx, Apple Ax, NVidia Denver, etc, etc...)
If someone can list what to run for a test I can probably run it on an A57. -- Len Sorensen _______________________________________________ Pixman mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/pixman
