"Li, Liang Z" <[email protected]> wrote: >> Rather than trying to cater to multiple assembly instruction implementations >> ourselves, have you tried taking the ideas in this earlier thread? >> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html >> >> Ideally, libc's memcmp() will already be using the most efficient assembly >> instructions without us having to reproduce the work of picking the >> instructions >> that work best. >> > > Eric, thanks for you information. I didn't notice that discussion before. > > > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo > length' > then write a test program to check a large amount of zero pages, and > use the 'time' to > recode the time takes by different optimization. Test result is like this: > > SSE2: > ------------------------------------------------------ > | test 1 | test 2 > ---------------------------------------------------- > Time(S):| 13.696 | 13.533 > ------------------------------------------------ > > > AVX2: > ------------------------------------------- > | test 1 | test 2 > ------------------------------------------- > Time (S):| 10.583 | 10.306 > ------------------------------------------- > > memeqzero4_paolo: > --------------------------------------- > | test 1 | test 2 > --------------------------------------- > Time (S):| 9.718 | 9.817 > ---------------------------------------- > > > Paolo's implementation has the best performance. It seems that we can > remove the SSE2 related Intrinsics.
How should I understand that comment? That you are about to send an email to remove the sse2 support and that I can forget about this patch? Thanks, Juan. > > Liang >> -- >> Eric Blake eblake redhat com +1-919-301-3266 >> Libvirt virtualization library http://libvirt.org
