[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007636#comment-17007636
 ] 

Joel Bernstein edited comment on SOLR-13890 at 1/3/20 5:30 PM:
---------------------------------------------------------------

The problem that I have with the scorer based filter is only about the use of 
the filter cache. Traditionally with filters the filter cache is used and with 
postfilters the filter cache is not used. This is actually a nice split, it's 
clean, easy to implement and easy to document. 

For these types of large filters, most of the time, the filter cache is not fit 
for purpose. But its there and it traps people, I've seen it over and over 
again.

The issue is that as the indexes get larger the top level filter cache becomes 
untenable because the filter is built against the entire index. Then when one 
document is indexed the entire cache is dropped and needs to be warmed. This is 
a huge problem for use cases like access control where there are large filters 
that need to get rebuilt for thousands of users. This drains memory and cpu and 
causes GC issues and performance problems.

So, I have traditionally turned to high performant postfilters, that are fast 
enough that caching is not needed for these types of problems. 

Eliminating the postfilter means, in my opinion overloading the behavior of 
traditional filters to have to deal intelligently with caches, or asking the 
user to dive into the details of caching.

The other option is to add the segment level filter cache that is actually fit 
for purpose and then standardize on the filter based approach. Until we do 
this, though the postfilters provide a simple approach for getting the behavior 
that is needed for these types of large filters.

 


was (Author: joel.bernstein):
The problem that I have with the scorer based filter is only about the use of 
the filter cache. Traditionally with filters the filter cache is used and with 
postfilters the filter cache is not used. This actually a nice split, it's 
clean, easy to implement and easy to document. 

For these types of large filters, most of the time, the filter cache is not fit 
for purpose. But its there and it traps people, I've seen it over and over 
again.

The issue is that as the indexes get larger the top level filter cache becomes 
untenable because the filter is built against the entire index. Then when one 
document is indexed the entire cache is dropped and needs to be warmed. This is 
a huge problem for use cases like access control where there are large filters 
that need to get rebuilt for thousands of users. This drains memory and cpu and 
causes GC issues and performance problems.

So, I have traditionally turned to high performant postfilters, that are fast 
enough that caching is not needed for these types of problems. 

Eliminating the postfilter means, in my opinion overloading the behavior of 
traditional filters to have to deal intelligently with caches, or asking the 
user to dive into the details of caching.

The other option is to add the segment level filter cache that is actually fit 
for purpose and then standardize on the filter based approach. Until we do 
this, though the postfilters provide a simple approach for getting the behavior 
that is needed for these types of large filters.

 

> Add postfilter support to {!terms} queries
> ------------------------------------------
>
>                 Key: SOLR-13890
>                 URL: https://issues.apache.org/jira/browse/SOLR-13890
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: master (9.0)
>            Reporter: Jason Gerlowski
>            Assignee: Jason Gerlowski
>            Priority: Major
>         Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to