jainankitk commented on issue #13084: URL: https://github.com/apache/lucene/issues/13084#issuecomment-3500159963
Thanks @salvatorecampagna for elaborating the approach. The execution plans looks pretty reasonable to me. Couple of questions to understand this better: > Note that runtime-only requires an O(maxDoc) scan during segment open to convert from the dense on-disk format to sparse in-memory representation for sparse cases. I am assuming that the O(maxDoc) cost is only incurred when we actually build the sparse in-memory representation. Also does this involve reading additional data from disk during segment open. If yes, it should be bound by the size of live docs, right? > This would validate (and potentially adjust) the 20% starting point. Since we are looking at `HistogramCollector` as one of the primary use case, 20% is slightly high imo. For example, if we have 1m documents in a segment, and 200k deleted documents, it might be just better to take the non-efficient path, instead of collecting using `PointTreeTraversal` first, and then retrospective correction by iterating over 200k documents and accessing their doc values. That being said, we can come up with better threshold based on the benchmark results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
