[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007566#comment-17007566 ]
David Smiley commented on SOLR-13890: ------------------------------------- Thanks for the benchmarks and postings the charts. This is really helpful! You characterize the two above as "Existing Impl" vs "Postfilter Impl". As I've been saying (and validated by your benchmark) the "Postfilter" aspect is not the interesting part as it's roughly equivalent to TwoPhaseIterator (which is in the existing impl). The differentiator is per-segment algorithm vs top-level algorithm. And you could do that via a TwoPhaseIterator and avoid Solr's old PostFilter. I consider PostFilter to be a relic to be avoided if you can do so. There's special code in SolrIndexSearcher about PostFilter to maintain and perhaps one day we can make it go away to reduce complexity. Continuing to use PostFilter needlessly will make that harder. I'm not going to veto use of PostFilter if you want to stick with what you have already coded, but I don't recommend it. I suggest you: * retain the existing implementation * add your new implementation (_potentially_ redeveloped to not use PostFilter) * keep method=docValuesTermsFilter but have it choose between these two implementations based on the number of terms; 700 being the pivot. Maybe make the pivot configurable and/or add a mechanism to explicitly choose one or the other. FWIW TermInSetQuery uses a non-configurable heuristic so don't feel you _have_ to make the threshold here configurable. * manually verify that an explicit cache=true/false has the intended effect for both impls I don't know wether to change the _default_ cache local-param to false. Seems like kind of a larger issue... like if we wanted this then it ought to apply to any query that does not use an index, not just some queries this parser produces. On the other hand, it's nice to have a simple understood rule that all filter queries are cached unless you say otherwise. Shrug. Ideally this QParser would be a bit smarter about choosing an optimal "method". If there is no terms index but there is docValues, then you should get a docValues impl instead of what appears to be a no-results query (ouch!). If there's both data structures, and the user sets cache=false and the number of terms is large, then the method should use docValues. Of course this is out of scope. > Add postfilter support to {!terms} queries > ------------------------------------------ > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers > Affects Versions: master (9.0) > Reporter: Jason Gerlowski > Assignee: Jason Gerlowski > Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, > SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, > post_optimize_performance.png > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org