tang-hi commented on PR #12417: URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628938905
> you say it is in prefix sum. How are you determining that? When I noticed a significant decrease in performance even with vectorized code, I reexamined the benchmark. Surprisingly, despite the significant performance advantages of packing and unpacking, the baseline performance was better for `decodeTo32`. Upon investigation, I found that its compression format allows it to conveniently represent two ints using a single long, resulting in only 64 iterations needed for prefix sum calculation. This convenience does not apply to our modified compression format, so I attempted to change it to directly calculate the prefix sum after decoding (and fully unrolling it). After running the benchmark, I found that the performance remained relatively the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org