jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2724038977
I have some small concerns: - The fact that the 512 step is tied to the number of points per leaf, though it's not a big deal at all, postings are similar: their encoding logic is specialized for blocks of 128. I guess I'd just rather err on a smaller block size than 512, which feels larg-ish. - Complexity: the encoding has 3 different sub encodings: 512, 128 and remainder. Could we have only two? But my main concern is more that I would like to better understand why 512 performs so much better. There must be something that happens with this 512 step that doesn't happen otherwise such as using different instructions, loop unrolling, better CPU pipelining or something else. I have some discomfort merging something that is faster without having at least an intuition of why it's faster, so that I can also understand which JVMs and CPUs would enable this speedup. Could pipelining be the reason as 24 (bits per value) * 32 (step) < 2 * 512 (bit width of SIMD instructions)? But then something like 128 should perform well while your benchmark suggests it's still much worse than 512? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org