jpountz commented on pull request #541: URL: https://github.com/apache/lucene/pull/541#issuecomment-1006408937
I was thinking of only supporting these numbers of bits per value indeed. For postings, numbers are always deltas, so we can generally expect them to be small. But for BKD trees it tends to be more an exception so I don't think we should spend too much effort on supporting so many bits per value and only focus on the ones that matter: - 32 bits per value for large segments where the doc ID order is random. - 24 bits per value for medium segments (less than 2^24 docs) where the doc ID order is random. - 16 bits per value plus delta coding from the minimum doc ID in the block for the case where there is some clustering of doc IDs. - And maybe the bitset strategy you added recently already covers the other cases like sorted indexes and values that exist in many docs, so that we don't need the delta-coding between consecutive anymore, which is slow anyway due to the cumulative sum? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org