jpountz commented on pull request #692: URL: https://github.com/apache/lucene/pull/692#issuecomment-1049857125
@rmuir We can remove the cost estimation, but it will not address the problem. I'll try to explain the problem differently in case it helps. DocIdSetBuilder takes doc IDs in random order with potential duplicates and creates a DocIdSet that can iterate over doc IDs in order without any duplicates. If you index a multi-valued field with points, a very large segment that has 2^30 docs might have 2^32 points matching a range query, which translates into 2^29 documents matching the query. So `DocIdBuilder#add` would be called 2^32 times and `DocIdSetBuilder#build` would result in a `DocIdSet` that has 2^29 documents. This `long` is measuring the number of calls to `DocIdSetBuilder#add`, hence the `long`. The naming may be wrong here, as the `grow` name probably suggests a number of docs rather than a number of calls to `add`, similarly to how `ArrayUtil#grow` is about the number of items in the array - not the number of times you set an index. Hopefully renaming it to `prepareAdd(long numCallsToAdd)` or something along these lines would help clarify. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org