jpountz opened a new pull request, #875:
URL: https://github.com/apache/lucene/pull/875

   I benchmarked OrdinalMap construction over high-cardinality fields, and lots 
of
   time gets spent into `PriorityQueue#downHeap` due to entry comparisons. I 
added
   a small hack that speeds up these comparisons a bit by extracting the first 8
   bytes of the terms as a comparable unsigned long, and using this long 
whenever
   possible for comparisons.
   
   On a dataset that consists of 100M documents and 10M unique values that 
consist
   of 16-bytes random bytes, OrdinalMap construction went from 9.4s to 6.0s. On
   the same number of docs/values where values consist of the same 8-bytes 
prefix
   and then 8 random bytes to simulate a worst-case scenario for this change,
   OrdinalMap construction went from 9.6s to 10.1s. So this looks like it can
   yield a significant speedup in some scenarios, while the slowdown is 
contained
   in the worst-case scenario?
   
   Unfortunately, this worst-case scenario is not exactly unlikely, e.g. this is
   what you would get with a dataset of IPv4-mapped IPv6 addresses, where all
   values share the same 12-bytes prefix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to