Hi Erik, the test was done on thousands of queries of that kind and milions of docs I went from <1500 qpm to ~ 6000 qpm on modest virtualized hardware (cpu bound and cpu was scarce) After that customer happy, time finished and didn't go further but definitely cost was something I'd try When I saw the presentation of CloudSearch where they explained that they were enabling/disabling caching based on fq statistics I thought this kind of problem were general enough that I could find a plugin already built
2016-01-05 19:17 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: > > &fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz > > You have a comma in front of the last fq clause, typo? > > Well, the whole point of caching filter queries is so that the > _second_ time you use it, > very little work has to be done. That comes at a cost of course for > first-time execution. > Basically any fq clause that you can guarantee won't be re-used should > have cache=false > set. > > I'd be surprised if the second time you use the provincia and type fq > clauses not caching > would be faster, but I've been surprised before. I guess anding two > bitsets together could > take more time than, say, testing a small number of individual > documents.... > > And I'm assuming that you're testing multiple queries rather than just > one-offs. > > If you _do_ know that some of your clauses are very restrictive, I > wonder what happens if > you add a cost in. fq's are evaluated in cost order (when > cache=false), so what happens > in this case? > &fq={!cache=false cost=101}n_rea:xxx&fq={!cache=false > cost=102}provincia:yyyy&fq={!cache=false cost=103}type:zzzz > > Best, > Erick > > On Tue, Jan 5, 2016 at 9:41 AM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: > > Thanks Erik and Binoy, > > This is a case I stumbled upon: with queries like > > > > > q=*:*&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz > > > > where n_rea filter is highly selective > > I was able to make > 3x performance improvement disabling cache > > > > I think it's because the last two filters are not so selective, they are > > resolved to two bitset which are then anded together > > and this is less efficient than leapfrogging since the first filter has > > just one or two results. > > Does it make sense to you? > > > > > > > > > > > > 2016-01-05 16:59 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: > > > >> Matteo: > >> > >> Let's see if I understand your problem. Essentially you want > >> Solr to analyze the filter queries and decide through some > >> algorithm which ones to cache. I have a hard time thinking of > >> any general way to do this, certainly there's not hing in Solr > >> that does this automatically As Binoy mentions there are some > >> ways to influence what goes in the cache, but the algorithm is > >> simple.... > >> > >> If you build such a thing, I suspect you'll be implicitly building > >> in knowledge of how your particular application uses Solr. For > >> sure, the functionality around "no cache filters" is there explicitly > >> because some fq clauses (think ACL calculations) can be > >> very expensive to calculate for the entire corpus (which is what > >> fqs do by default). > >> > >> But you really haven't given us some examples of what sorts > >> of fq clauses you consider "bad". Perhaps there are other ways > >> of approaching your problem. > >> > >> Best, > >> Erick > >> > >> > >> On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <binoydala...@gmail.com> > >> wrote: > >> > What is your exact requirement then? > >> > I ask, because these settings can solve the problems you've mentioned > >> > without the need to add any additional functionality. > >> > > >> > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <matteo.gro...@gmail.com > > > >> > wrote: > >> > > >> >> Hi Binoy, > >> >> I know these settings but the problem I'm trying to solve is > when > >> >> these settings aren't enough. > >> >> > >> >> > >> >> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <binoydala...@gmail.com>: > >> >> > >> >> > If I understand your problem correctly, then you don't want the > most > >> >> > frequently used fqs removed and you do not want your filter cache > to > >> grow > >> >> > to very large sizes. > >> >> > Well there is already a solution for both of these. > >> >> > In the solrconfig.xml file, you can configure the <filterCache> > >> parameter > >> >> > to suit your needs. > >> >> > a) Use the LeastFrequentlyUsed or LFU eviction policy. > >> >> > b) Set the size to whatever number of fqs you find suitable. > >> >> > You can do this like so: > >> >> > <filterCache class="solr.LFUCache" size="100" initialSize="10" > >> >> > autoWarmCount="10"/> > >> >> > You should play around with these parameters to find the best > >> combination > >> >> > for your implementation. > >> >> > For more details take a look here: > >> >> > https://wiki.apache.org/solr/SolrCaching > >> >> > http://yonik.com/advanced-filter-caching-in-solr/ > >> >> > > >> >> > > >> >> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla < > matteo.gro...@gmail.com > >> > > >> >> > wrote: > >> >> > > >> >> > > Hi, > >> >> > > after looking at the presentation of cloudsearch from lucene > >> >> > revolution > >> >> > > 2014 > >> >> > > > >> >> > > > >> >> > > >> >> > >> > https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49 > >> >> > > min 17:08 > >> >> > > > >> >> > > I recognized I'd love to be able to remove the burden of > disabling > >> >> filter > >> >> > > query caching from developers > >> >> > > > >> >> > > the problem: > >> >> > > Solr by default caches filter queries > >> >> > > a) When there are filter queries that are not reused and few that > >> are > >> >> the > >> >> > > good ones get evicted unnecessarily > >> >> > > b) if the same query has multiple filter queries that are very > >> >> selective > >> >> > I > >> >> > > noticed a big performance disabling cache > >> >> > > c) I'd like to spare developers from deciding what has to be > cached > >> or > >> >> > not > >> >> > > > >> >> > > the question: > >> >> > > -Is there anything already working to solve those problems? > >> >> > > > >> >> > > what do you think about this? > >> >> > > -I was thinking to write a plugin to recognize query types with > >> regular > >> >> > > exception and let solr admins associate a caching behaviour with > >> each > >> >> > query > >> >> > > type > >> >> > > -another idea was to > >> >> > > -by default set fq caching off > >> >> > > -keep statistics about fq > >> >> > > -enable caching only for the N fq with highest hit ratio > >> >> > > > >> >> > -- > >> >> > Regards, > >> >> > Binoy Dalal > >> >> > > >> >> > >> > -- > >> > Regards, > >> > Binoy Dalal > >> >