mikemccand opened a new issue, #12704: URL: https://github.com/apache/lucene/issues/12704
### Description Spinoff from [this cool comment](https://github.com/apache/lucene/pull/12633#discussion_r1366847986), thanks to hashing guru @bruno-roustant: ``` Instead, we should multiply with the gold constant BitMixer#PHI_C64 (make it public). This really makes a difference in the evenness of the value distribution. This is one of the secrets of the HPPC hashing. By applying this, we get multiple advantages: * lookup should be improved (less hash collision) * we can try to rehash at 3/4 occupancy because the performance should not be impacted until this point. * in case of hash collision, we can lookup linearly with a pos = pos + 1 instead of quadratic probe (lines 95 and 327); this may avoid some mem cache miss. * (same for the other hash method) ``` This is a simple change, we just need to test on some real FST building cases to confirm good mixing "in practice". The new `IndexToFST` tool in `luceneutil` is helpful for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org