epotyom commented on issue #14619:
URL: https://github.com/apache/lucene/issues/14619#issuecomment-2875780860

   In my opinion, the new approach can eventually do everything the current 
approach does, but there are quite a few gaps to cover, see Milestone 2 in [the 
plan 
document](https://docs.google.com/document/d/1PF9KWYboy6terrPp8Frizlkp1ee09RX-DsuZrBux-Oo/edit?usp=sharing).
 Whether or not we want to deprecate the old functionality after that is a good 
question. The only benefit of pre-collecting to docId sets I know is that in 
theory user can do something like find top 1 book author (with taxonomy facets) 
and then count docs for price ranges for matching books of this author by 
reusing the docID set + 
[fastMatchQuery](https://github.com/apache/lucene/blob/0ea423e3025893fa1ce9a2633c59a7578b8478ea/lucene/facet/src/java/org/apache/lucene/facet/FacetCountsWithFilterQuery.java#L41-L45)
 . I don't know if anyone actually does something like that. Also, we can 
implement similar functionality for the new approach by making it compatible 
with pre-collected docID sets, I've just added the task to th
 e Milestone 2.
   
   The other potential concern is performance. While in general the new 
approach seems to be more efficient as it doesn't require intermediate docID 
sets, there are some cases where the old approach is faster, e.g. for taxonomy 
when user counts for MatchAllDocs query for a facet index field that is 
responsible for creating majority of taxonomy facet labels, see [luceneutil 
#325](https://github.com/mikemccand/luceneutil/pull/325#issuecomment-2580729914)
 for details. Although, I think we can find a way to optimize 
CountFacetRecorder for dense counting. Another example, the implementation for 
[long values facet 
counts](https://github.com/apache/lucene/blob/0ea423e3025893fa1ce9a2633c59a7578b8478ea/lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/LongValueFacetCutter.java)
 for the new approach is also very inefficient, although Milestone 0 has an 
idea to try that can make it faster.
   
   Just to summarize, what I guess I'm saying is that eventually the new 
approach can replace the old one, but it will take time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to