[jira] [Commented] (SOLR-13890) Add postfilter support to {!terms} queries

Jason Gerlowski (Jira) Tue, 07 Jan 2020 04:42:57 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009681#comment-17009681
 ]


Jason Gerlowski commented on SOLR-13890:
----------------------------------------

Per a request offline by [~dsmiley], I've created a PR for all subsequent 
development on this issue (as it makes in-line review easier).  Any patches 
attached here predate this PR: https://github.com/apache/lucene-solr/pull/1151

I'll reply to Mikhail's comments here, but maybe further review should be done 
on the PR itself:

bq. adding argument to method QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams) is not something which is backward 
compatible, and might frustrate other devs.
Backwards compatible?  Does that apply here?  We aim to keep backcompat for our 
public interfaces, plugins, and SolrJ, but this is neither of those.  It's a 
private nested class not visible outside this one file.  Is there some reason 
I'm missing why we should care about backcompat here?

bq. TopLevelDocValuesTermsQuery uses OrdinalMap via getSlowAtomicReader(). It 
might be clearer to iterate persegment
Maybe I'm misreading your suggestion, but the whole purpose of this issue is 
that we're trying to avoid per-segment iteration for performance reasons.  I'd 
be happy to change gears if you have an alternative that has comparable 
performance to what we're seeing with the global iteration, but our perf tests 
have borne out global-iteration as the more efficient approach at large numbers 
of query terms.

bq. Also, this query relies on SolrIndexSearcher, but iirc even in Solr queries 
sometimes invoked with Lucene's Searcher. There's some issues with such cast
I'm still reading through SOLR-6357 to understand the exact context here.  But 
the cast to SolrIndexSearcher in QParserPlugins and query implementations is 
very common in our codebase (see below).  If you think it'd be safer, I can add 
an {{instanceof}} check there, and try to fall back to the per-segment approach 
if we ever get a non-SolrIndexSearcher.  But from how frequently this is done 
in our query implementations, I'm not sure the danger is still there?

{code}
➜  lucene-solr git:(SOLR_13890) ✗ grep -rIl "(SolrIndexSearcher)[ ]\?searcher" .
./solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java
./solr/core/src/java/org/apache/solr/search/TextLogisticRegressionQParserPlugin.java
./solr/core/src/java/org/apache/solr/search/SignificantTermsQParserPlugin.java
./solr/core/src/java/org/apache/solr/search/IGainTermsQParserPlugin.java
./solr/core/src/java/org/apache/solr/search/GraphTermsQParserPlugin.java
./solr/core/src/java/org/apache/solr/search/join/GraphQuery.java
./solr/core/src/java/org/apache/solr/search/join/HashRangeQuery.java
./solr/core/src/java/org/apache/solr/search/join/XCJFQuery.java
./solr/core/src/java/org/apache/solr/search/QueryContext.java
./solr/core/src/java/org/apache/solr/search/ReRankCollector.java
./solr/core/src/java/org/apache/solr/search/HashQParserPlugin.java
./solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java
./solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java
./solr/core/src/java/org/apache/solr/query/SolrRangeQuery.java
./solr/core/src/java/org/apache/solr/query/FilterQuery.java
{code}

> Add postfilter support to {!terms} queries
> ------------------------------------------
>
>                 Key: SOLR-13890
>                 URL: https://issues.apache.org/jira/browse/SOLR-13890
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: master (9.0)
>            Reporter: Jason Gerlowski
>            Assignee: Jason Gerlowski
>            Priority: Major
>         Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 
> at 2.25.12 PM.png, post_optimize_performance.png, 
> toplevel-tpi-perf-comparison.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13890) Add postfilter support to {!terms} queries

Reply via email to