[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007636#comment-17007636 ]
Joel Bernstein edited comment on SOLR-13890 at 1/3/20 5:29 PM: --------------------------------------------------------------- The problem that I have with the scorer based filter is only about the use of the filter cache. Traditionally with filters the filter cache is used and with postfilters the filter cache is not used. This actually a nice split, it's clean, easy to implement and easy to document. For these types of large filters, most of the time, the filter cache is not fit for purpose. But its there and it traps people, I've seen it over and over again. The issue is that as the indexes get larger the top level filter cache becomes untenable because the filter is built against the entire index. Then when one document is indexed the entire cache is dropped and needs to be warmed. This is a huge problem for use cases like access control where there are large filters that need to get rebuilt for thousands of users. This drains memory and cpu and causes GC issues and performance problems. So, I have traditionally turned to high performant postfilters, that are fast enough that caching is not needed for these types of problems. Eliminating the postfilter means, in my opinion overloading the behavior of traditional filters to have to deal intelligently with caches, or asking the user to dive into the details of caching. The other option is to add the segment level filter cache that is actually fit for purpose and then standardize on the filter based approach. Until we do this, though the postfilters provide a simple approach for getting the behavior that is needed for these types of large filters. was (Author: joel.bernstein): The problem that I have with the scorer based filter is only about the use of the filter cache. Traditionally with filters the filter cache is used and with postfilters the filter cache is not used. This actually a nice split, it's clean, easy to implement and easy to document. For these types of large filters, most of the time, the filter cache is not fit for purpose. But its there and it traps people, I've seen it over and over again. The issue is that as the indexes get larger the top level filter cache becomes untenable because the filter is built against the entire index. Then when one document is indexed the entire cache is dropped and needs to be warmed. This is a huge problem for use cases like access control where there are large filters that need to get rebuilt for thousands of users. This drains memory and cpu and cause GC issues and performance problems. So, I have traditionally turned to high performant postfilters, that are fast enough that caching is not needed for these types of problems. Eliminating the postfilter means, in my opinion overloading the behavior of traditional filters to have to deal intelligently with caches, or asking the user to dive into the details of caching. The other option is to add the segment level filter cache that is actually fit for purpose and then standardize on the filter based approach. Until we do this, though the postfilters provide a simple approach for getting the behavior that is needed for these types of large filters. > Add postfilter support to {!terms} queries > ------------------------------------------ > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers > Affects Versions: master (9.0) > Reporter: Jason Gerlowski > Assignee: Jason Gerlowski > Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, > SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, > post_optimize_performance.png > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org