mccullocht commented on PR #14978: URL: https://github.com/apache/lucene/pull/14978#issuecomment-3103920973
@ChrisHegarty I have an attempt to push things bulk scoring into Lucene as well in another branch: https://github.com/mccullocht/lucene/tree/bulk-vector-scorer-panama. I pushed the array all the way down rather than trying to dictate the number of vectors scored in bulk, but ultimately scored 4 at a time in the inner loop. I got a 10-12% improvement on an M2 Mac but it was not any better on Graviton 2 or 3 processors. There's a theory that the Apple processors decode instructions much further ahead than other aarch64 processors so they are able to prefetch/load ahead and feed in data faster. I want to be able to _prove_ that having this interface allows a faster implementation before marking it ready for review so I may put together a crude/incomplete `FlatVectorScorer` and sandbox codec that uses a native implementation for prefetching but omit it from this PR. For larger vectors (1024d+) prefetching vector N+1 when scoring vector N is quite effective in my tests, but there's no way to represent this in Java. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org