javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2243620253
I did some work on this draft PR and made the fixes around hits counting early termination and caching more future proof. They are now contained to `TotalHitCountCollectorManager`, which feels like a good fit. I need to investigate recurring test failures on `TestSortRandom`. I am getting some facets failures too, which are predictable, given I have not updated the facets code in any way. I need help figuring out a way forward. `FacetsCollector` has a separate single segment code-path, I see different assertions tripping if I try to adjust some of that logic. At the end of it I get the "Sub-iterators of ConjunctionDISI are not on the same document!" error, which sounds accurate given that each partition gets its own iterator and they would all be at a different doc indeed. I am not sure how to proceed, we may want to perhaps opt out of applying intra-segment concurrency in this case. There's also an issue with `DrillSideways`, which has a bulk scorer that ignores the provided range of doc ids, and even asserts that the full range is provided. @mikemccand would you have ideas on what to do here? It would be good to align and decide on terminology: I have no strong opinions. To me, the `LeafReaderContextPartition` terminology can coexist with the existing slice terminology. Slices of an index are group of segments of that index. A leaf partition is a subset of a segment, identified by a `LeafReaderContext` and a range of doc ids. This is only a starting point and I am happy to hear what ideas others have around this. Next thing I'd like to start thinking about is whether support for intra-segment concurrency requires breaking changes to make, as we have a chance to make those with Lucene 10. I wonder specifically if we are going to be able to hide the notion of leaf partition and the range of doc ids like I currently do, or if that will need to become more of a first class citizen, for instance when we'll want to remove duplicated work across partitions. I need to do a bit more digging to form an opinion on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org