[GitHub] [lucene] gf2121 commented on pull request #510: LUCENE-10280: Store BKD blocks with continuous ids more efficiently

GitBox Wed, 08 Dec 2021 10:58:15 -0800


gf2121 commented on pull request #510:
URL: https://github.com/apache/lucene/pull/510#issuecomment-989104054



   @iverase Thanks for your explanation!
   
   > I worked on the PR about using #readLELongs but never get a meaningful 
speed up that justify the added complexity.
   
   I find that we were trying to use #readLELongs to speed up 24/32 bit 
situation in the `DocIdsWriter`, which means the ids in the block are unsorted, 
typically happening in high cardinarlity fields. I think queries on high 
cardinality fields spend most of their time on `visitDocValues` but not 
`readDocIds`, so maybe this is the reason that we can not see a obvious gain on 
E2E side?
   
   My current thoughts are about using readLELongs to speed up the **sorted** 
ids situation (means low or medium cardinality fields), whose bottleneck is 
reading docIds. For sorted arrays,  we can compute the delta of the sorted ids 
and encode/decode them like what we do in `StoredFieldsInts`. 
   
   I raised an [ISSUE](https://issues.apache.org/jira/browse/LUCENE-10297) 
based this idea. The benchmark result i post in the issue looks promising. 
Would you like to help take a look when you are free? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gf2121 commented on pull request #510: LUCENE-10280: Store BKD blocks with continuous ids more efficiently

Reply via email to