jpountz commented on pull request #541:
URL: https://github.com/apache/lucene/pull/541#issuecomment-1006408937


   I was thinking of only supporting these numbers of bits per value indeed. 
For postings, numbers are always deltas, so we can generally expect them to be 
small. But for BKD trees it tends to be more an exception so I don't think we 
should spend too much effort on supporting so many bits per value and only 
focus on the ones that matter:
    - 32 bits per value for large segments where the doc ID order is random.
    - 24 bits per value for medium segments (less than 2^24 docs) where the doc 
ID order is random.
    - 16 bits per value plus delta coding from the minimum doc ID in the block 
for the case where there is some clustering of doc IDs.
    - And maybe the bitset strategy you added recently already covers the other 
cases like sorted indexes and values that exist in many docs, so that we don't 
need the delta-coding between consecutive anymore, which is slow anyway due to 
the cumulative sum?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to