tang-hi commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1628938905

   > you say it is in prefix sum. How are you determining that?
   
   When I noticed a significant decrease in performance even with vectorized 
code, I reexamined the benchmark. Surprisingly, despite the significant 
performance advantages of packing and unpacking, the baseline performance was 
better for `decodeTo32`. Upon investigation, I found that its compression 
format allows it to conveniently represent two ints using a single long, 
resulting in only 64 iterations needed for prefix sum calculation. This 
convenience does not apply to our modified compression format, so I attempted 
to change it to directly calculate the prefix sum after decoding (and fully 
unrolling it). After running the benchmark, I found that the performance 
remained relatively the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to