gf2121 opened a new pull request, #12587:
URL: https://github.com/apache/lucene/pull/12587

   ### Description
   
   Sort terms in TermInSetQuery with radix sort. This helps TermInSetQueries 
with a number of terms.
   
   ### Benchmark
   
   I made a simple benchmark on sorting `BytesRef[]` with random bytes to 
verify the improvements.
   
   <!--StartFragment--><byte-sheet-html-origin data-id="1695641274254" 
data-version="4" data-is-embed="false" data-grid-line-hidden="false" 
data-importRangeRawData-spreadSource="https://bytedance.feishu.cn/sheets/G5dwsdvZ7hOxXftyfDkcvUkYnqB";
 data-importRangeRawData-range="&#39;Sheet1&#39;!A2:D12">
   
     | timsort ( took nanos ) | radixsort ( took nanos ) | took diff
   -- | -- | -- | --
   10 terms (16 bytes per term) | 1292 | 1083 | -16.18%
   100 terms (16 bytes per term) | 17959 | 11750 | -34.57%
   1000 terms (16 bytes per term) | 387916 | 50375 | -87.01%
   10000 terms (16 bytes per term) | 5407208 | 1062500 | -80.35%
   100000 terms (16 bytes per term) | 65577084 | 5404958 | -91.76%
   10 terms (256 bytes per term) | 3500 | 1750 | -50.00%
   100 terms (256 bytes per term) | 18000 | 11708 | -34.96%
   1000 terms (256 bytes per term) | 410959 | 52417 | -87.25%
   10000 terms (256 bytes per term) | 5325666 | 1299125 | -75.61%
   100000 terms (256 bytes per term) | 71316500 | 11346584 | -84.09%
   
   </byte-sheet-html-origin><!--EndFragment-->
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to