javanna commented on PR #13542:
URL: https://github.com/apache/lucene/pull/13542#issuecomment-2243620253

   I did some work on this draft PR and made the fixes around hits counting 
early termination and caching more future proof. They are now contained to 
`TotalHitCountCollectorManager`, which feels like a good fit.
   
   I need to investigate recurring test failures on `TestSortRandom`.
   
   I am getting some facets failures too, which are predictable, given I have 
not updated the facets code in any way. I need help figuring out a way forward. 
`FacetsCollector` has a separate  single segment code-path, I see different 
assertions tripping if I try to adjust some of that logic. At the end of it I 
get the "Sub-iterators of ConjunctionDISI are not on the same document!" error, 
which sounds accurate given that each partition gets its own iterator and they 
would all be at a different doc indeed. I am not sure how to proceed, we may 
want to perhaps opt out of applying intra-segment concurrency in this case. 
There's also an issue with `DrillSideways`, which has a bulk scorer that 
ignores the provided range of doc ids, and even asserts that the full range is 
provided. @mikemccand would you have ideas on what to do here?
   
   It would be good to align and decide on terminology: I have no strong 
opinions. To me, the `LeafReaderContextPartition` terminology can coexist with 
the existing slice terminology. Slices of an index are group of segments of 
that index. A leaf partition is a subset of a segment, identified by a 
`LeafReaderContext` and a range of doc ids. This is only a starting point and I 
am happy to hear what ideas others have around this.
   
   Next thing I'd like to start thinking about is whether support for 
intra-segment concurrency requires breaking changes to make, as we have a 
chance to make those with Lucene 10. I wonder specifically if we are going to 
be able to hide the notion of leaf partition and the range of doc ids like I 
currently do, or if that will need to become more of a first class citizen, for 
instance when we'll want to remove duplicated work across partitions. I need to 
do a bit more digging to form an opinion on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to