sgup432 opened a new pull request, #16123:
URL: https://github.com/apache/lucene/pull/16123

   
   ### Description
   
   As a followup to one of the comment in this PR - 
https://github.com/apache/lucene/pull/16050 by @neoremind and @benwtrent, I am 
adding this change.
   
   Earlier, after the vectorized comparison, we would loop over the matching 
bits and do bit by bit update on the bitset. This is not needed as ideally we 
could directly perform a OR operation and set it in the bitset in just one 
operation. 
   
   Also handles the case where if we need to span across two words, for example:
   base = 60, and we compared 8 docs together giving the maskBits as 
`0b10110001`.
   
   Then docs 60-67 span two words: 
   - Docs 60-63 are in bits[0] (bit positions 60-63)
   - Docs 64-67 are in bits[1] (bit positions 0-3)
   
   
   Results below. This change gave a further boost!
     
     **c5.2xlarge (AVX-512):**
     
     | Pattern | docCount | Fields | Before OR mask (ops/s) | With OR mask 
(ops/s) | Change |
     |---|---|---|---|---|---|
     | random | 1M | 1 | 200.2 | 273.5 | +37% |
     | random | 1M | 3 | 66.8 | 91.5 | +37% |
     | random | 1M | 5 | 62.6 | 81.9 | +31% |
     | random | 10M | 1 | 24.4 | 34.4 | +41% |
     | random | 10M | 3 | 8.1 | 11.1 | +37% |
     | random | 10M | 5 | 7.2 | 9.8 | +36% |
     | clustered | 1M | 1 | 8,856 | 11,345 | +28% |
     | clustered | 1M | 3 | 33,846 | 36,577 | +8% |
     | clustered | 1M | 5 | 35,055 | 34,975 | ~0% |
     | clustered | 10M | 1 | 1,308 | 1,324 | +1% |
     | clustered | 10M | 3 | 24,388 | 24,052 | ~0% |
     | clustered | 10M | 5 | 12,940 | 12,884 | ~0% |
     
     **Mac (Apple M-series, 128-bit NEON):**
     
     | Pattern | docCount | Fields | Before OR mask (ops/s) | With OR mask 
(ops/s) | Change |
     |---|---|---|---|---|---|
     | random | 1M | 1 | 219.5 | 250.2 | +14% |
     | random | 1M | 3 | 86.9 | 93.6 | +8% |
     | random | 1M | 5 | 80.2 | 80.2 | ~0% |
     | random | 10M | 1 | 27.1 | 28.2 | +4% |
     | random | 10M | 3 | 9.0 | 7.5 | -17%* |
     | random | 10M | 5 | 7.7 | 8.7 | +12% |
     | clustered | 1M | 1 | 27,137 | 36,003 | +33% |
     | clustered | 1M | 3 | 56,333 | 81,988 | +46% |
     | clustered | 1M | 5 | 77,159 | 78,315 | +2% |
     | clustered | 10M | 1 | 3,502 | 3,617 | +3% |
     | clustered | 10M | 3 | 57,900 | 61,969 | +7% |
     | clustered | 10M | 5 | 31,728 | 31,502 | ~0% |
   
   
   
   
   <!--
   If this is your first contribution to Lucene, please make sure you have 
reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to