[GitHub] [lucene] jpountz commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

GitBox Thu, 24 Feb 2022 05:24:53 -0800


jpountz commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1049857125



   @rmuir We can remove the cost estimation, but it will not address the 
problem. I'll try to explain the problem differently in case it helps.
   
   DocIdSetBuilder takes doc IDs in random order with potential duplicates and 
creates a DocIdSet that can iterate over doc IDs in order without any 
duplicates. If you index a multi-valued field with points, a very large segment 
that has 2^30 docs might have 2^32 points matching a range query, which 
translates into 2^29 documents matching the query. So `DocIdBuilder#add` would 
be called 2^32 times and `DocIdSetBuilder#build` would result in a `DocIdSet` 
that has 2^29 documents. This `long` is measuring the number of calls to 
`DocIdSetBuilder#add`, hence the `long`.
   
   The naming may be wrong here, as the `grow` name probably suggests a number 
of docs rather than a number of calls to `add`, similarly to how 
`ArrayUtil#grow` is about the number of items in the array - not the number of 
times you set an index. Hopefully renaming it to `prepareAdd(long 
numCallsToAdd)` or something along these lines would help clarify.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

Reply via email to