David Smiley created SOLR-14904:
-----------------------------------

             Summary: Don't use documentCache for large result sets
                 Key: SOLR-14904
                 URL: https://issues.apache.org/jira/browse/SOLR-14904
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: David Smiley


Some users ask Solr to return many documents (high rows param), even though 
this is an anti-pattern.  Sometimes there is some sense to it, and even Solr 
itself will do it in some cases like "bin/solr export" and perhaps some 
streaming-expressions cases.  If there is a documentCache, these queries have a 
tendency to completely thrash it -- dump it and fill it with poor cache 
candidates.  I've even seen the cache's existence for such queries become a 
bottleneck of the query -- granted for the now old LRUCache and in a 
particularly high abuse-case.

I propose that if the number of documents to be returned is above some fraction 
of the documentCache's size limit, then don't use the documentCache at all.  
Maybe half size is sufficient?  Or quarter-size?  Maybe at least 
queryWindowSize big (thus at least 20 typically)?  I see in solrconfig a 
queryResultMaxDocsCached option used for the queryResultCache but it could be 
made to apply to populating the documentCache as well.  Code default is 
infinite but the default and most configs set to 200.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to