epotyom commented on issue #14619: URL: https://github.com/apache/lucene/issues/14619#issuecomment-2875780860
In my opinion, the new approach can eventually do everything the current approach does, but there are quite a few gaps to cover, see Milestone 2 in [the plan document](https://docs.google.com/document/d/1PF9KWYboy6terrPp8Frizlkp1ee09RX-DsuZrBux-Oo/edit?usp=sharing). Whether or not we want to deprecate the old functionality after that is a good question. The only benefit of pre-collecting to docId sets I know is that in theory user can do something like find top 1 book author (with taxonomy facets) and then count docs for price ranges for matching books of this author by reusing the docID set + [fastMatchQuery](https://github.com/apache/lucene/blob/0ea423e3025893fa1ce9a2633c59a7578b8478ea/lucene/facet/src/java/org/apache/lucene/facet/FacetCountsWithFilterQuery.java#L41-L45) . I don't know if anyone actually does something like that. Also, we can implement similar functionality for the new approach by making it compatible with pre-collected docID sets, I've just added the task to th e Milestone 2. The other potential concern is performance. While in general the new approach seems to be more efficient as it doesn't require intermediate docID sets, there are some cases where the old approach is faster, e.g. for taxonomy when user counts for MatchAllDocs query for a facet index field that is responsible for creating majority of taxonomy facet labels, see [luceneutil #325](https://github.com/mikemccand/luceneutil/pull/325#issuecomment-2580729914) for details. Although, I think we can find a way to optimize CountFacetRecorder for dense counting. Another example, the implementation for [long values facet counts](https://github.com/apache/lucene/blob/0ea423e3025893fa1ce9a2633c59a7578b8478ea/lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/LongValueFacetCutter.java) for the new approach is also very inefficient, although Milestone 0 has an idea to try that can make it faster. Just to summarize, what I guess I'm saying is that eventually the new approach can replace the old one, but it will take time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org