[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007566#comment-17007566
 ] 

David Smiley commented on SOLR-13890:
-------------------------------------

Thanks for the benchmarks and postings the charts.  This is really helpful!

You characterize the two above as "Existing Impl" vs "Postfilter Impl".  As 
I've been saying (and validated by your benchmark) the "Postfilter" aspect is 
not the interesting part as it's roughly equivalent to TwoPhaseIterator (which 
is in the existing impl).  The differentiator is per-segment algorithm vs 
top-level algorithm.  And you could do that via a TwoPhaseIterator and avoid 
Solr's old PostFilter.  I consider PostFilter to be a relic to be avoided if 
you can do so.  There's special code in SolrIndexSearcher about PostFilter to 
maintain and perhaps one day we can make it go away to reduce complexity.  
Continuing to use PostFilter needlessly will make that harder.  I'm not going 
to veto use of PostFilter if you want to stick with what you have already 
coded, but I don't recommend it.

I suggest you:
* retain the existing implementation
* add your new implementation (_potentially_ redeveloped to not use PostFilter)
* keep method=docValuesTermsFilter but have it choose between these two 
implementations based on the number of terms; 700 being the pivot.  Maybe make 
the pivot configurable and/or add a mechanism to explicitly choose one or the 
other.  FWIW TermInSetQuery uses a non-configurable heuristic so don't feel you 
_have_ to make the threshold here configurable.
* manually verify that an explicit cache=true/false has the intended effect for 
both impls

I don't know wether to change the _default_ cache local-param to false.  Seems 
like kind of a larger issue... like if we wanted this then it ought to apply to 
any query that does not use an index, not just some queries this parser 
produces.  On the other hand, it's nice to have a simple understood rule that 
all filter queries are cached unless you say otherwise.  Shrug.

Ideally this QParser would be a bit smarter about choosing an optimal "method". 
 If there is no terms index but there is docValues, then you should get a 
docValues impl instead of what appears to be a no-results query (ouch!).  If 
there's both data structures, and the user sets cache=false and the number of 
terms is large, then the method should use docValues.  Of course this is out of 
scope.

> Add postfilter support to {!terms} queries
> ------------------------------------------
>
>                 Key: SOLR-13890
>                 URL: https://issues.apache.org/jira/browse/SOLR-13890
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: master (9.0)
>            Reporter: Jason Gerlowski
>            Assignee: Jason Gerlowski
>            Priority: Major
>         Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to