Binoy: bq: In such a case won't applying fqs normally be the same as applying them as post filters
Certainly not, at least AFAIK... By definition, regular FQs are calculated over the entire corpus (not, NOT just the docs that satisfy the query). Then that entire bitset is stored in the filterCache where it can be reused. Which is why filterCache entries can be used for different queries. Also by definition, post filters are _not_ calculated over the entire corpus, they are only calculated for docs that 1> pass the query criteria and 2> pass all lower-cost filters so they will not apply at all to the next query, are not stored in the filterCache etc. So I think what Matteo is seeing is that with a restrictive FQ clause, very few docs have to be tested against most of the FQs. Matteo: My guess (and I'm not intimately familiar with the code) is that, indeed the restrictive clause is helping you a lot here. Frankly I doubt if adding a cost will make a measurable difference if the most restrictive FQ clause is quite sparse.... I'm still puzzled in your test scenario why there is such a difference when making all the filer queries cache=false. _Assuming_ that provincia and type are relatively low-cardinality fields, they should all be in the filterCache pretty quickly But perhaps anding the bitset together is more expensive than the advantage in this case. I'd be curious as to the hit ratio you were seeing. But as you say, if the client is satisfied I'm not sure it's worth pursuing... Best, Erick On Tue, Jan 5, 2016 at 11:09 AM, Matteo Grolla <matteo.gro...@gmail.com> wrote: > Hi Erik, > the test was done on thousands of queries of that kind and milions of > docs > I went from <1500 qpm to ~ 6000 qpm on modest virtualized hardware (cpu > bound and cpu was scarce) > After that customer happy, time finished and didn't go further but > definitely cost was something I'd try > When I saw the presentation of CloudSearch where they explained that they > were enabling/disabling caching based on fq statistics I thought this kind > of problem were general enough that I could find a plugin already built > > 2016-01-05 19:17 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: > >> >> &fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz >> >> You have a comma in front of the last fq clause, typo? >> >> Well, the whole point of caching filter queries is so that the >> _second_ time you use it, >> very little work has to be done. That comes at a cost of course for >> first-time execution. >> Basically any fq clause that you can guarantee won't be re-used should >> have cache=false >> set. >> >> I'd be surprised if the second time you use the provincia and type fq >> clauses not caching >> would be faster, but I've been surprised before. I guess anding two >> bitsets together could >> take more time than, say, testing a small number of individual >> documents.... >> >> And I'm assuming that you're testing multiple queries rather than just >> one-offs. >> >> If you _do_ know that some of your clauses are very restrictive, I >> wonder what happens if >> you add a cost in. fq's are evaluated in cost order (when >> cache=false), so what happens >> in this case? >> &fq={!cache=false cost=101}n_rea:xxx&fq={!cache=false >> cost=102}provincia:yyyy&fq={!cache=false cost=103}type:zzzz >> >> Best, >> Erick >> >> On Tue, Jan 5, 2016 at 9:41 AM, Matteo Grolla <matteo.gro...@gmail.com> >> wrote: >> > Thanks Erik and Binoy, >> > This is a case I stumbled upon: with queries like >> > >> > >> q=*:*&fq={!cache=false}n_rea:xxx&fq={!cache=false}provincia:yyyy,fq={!cache=false}type:zzzz >> > >> > where n_rea filter is highly selective >> > I was able to make > 3x performance improvement disabling cache >> > >> > I think it's because the last two filters are not so selective, they are >> > resolved to two bitset which are then anded together >> > and this is less efficient than leapfrogging since the first filter has >> > just one or two results. >> > Does it make sense to you? >> > >> > >> > >> > >> > >> > 2016-01-05 16:59 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: >> > >> >> Matteo: >> >> >> >> Let's see if I understand your problem. Essentially you want >> >> Solr to analyze the filter queries and decide through some >> >> algorithm which ones to cache. I have a hard time thinking of >> >> any general way to do this, certainly there's not hing in Solr >> >> that does this automatically As Binoy mentions there are some >> >> ways to influence what goes in the cache, but the algorithm is >> >> simple.... >> >> >> >> If you build such a thing, I suspect you'll be implicitly building >> >> in knowledge of how your particular application uses Solr. For >> >> sure, the functionality around "no cache filters" is there explicitly >> >> because some fq clauses (think ACL calculations) can be >> >> very expensive to calculate for the entire corpus (which is what >> >> fqs do by default). >> >> >> >> But you really haven't given us some examples of what sorts >> >> of fq clauses you consider "bad". Perhaps there are other ways >> >> of approaching your problem. >> >> >> >> Best, >> >> Erick >> >> >> >> >> >> On Tue, Jan 5, 2016 at 7:50 AM, Binoy Dalal <binoydala...@gmail.com> >> >> wrote: >> >> > What is your exact requirement then? >> >> > I ask, because these settings can solve the problems you've mentioned >> >> > without the need to add any additional functionality. >> >> > >> >> > On Tue, Jan 5, 2016 at 9:04 PM Matteo Grolla <matteo.gro...@gmail.com >> > >> >> > wrote: >> >> > >> >> >> Hi Binoy, >> >> >> I know these settings but the problem I'm trying to solve is >> when >> >> >> these settings aren't enough. >> >> >> >> >> >> >> >> >> 2016-01-05 16:30 GMT+01:00 Binoy Dalal <binoydala...@gmail.com>: >> >> >> >> >> >> > If I understand your problem correctly, then you don't want the >> most >> >> >> > frequently used fqs removed and you do not want your filter cache >> to >> >> grow >> >> >> > to very large sizes. >> >> >> > Well there is already a solution for both of these. >> >> >> > In the solrconfig.xml file, you can configure the <filterCache> >> >> parameter >> >> >> > to suit your needs. >> >> >> > a) Use the LeastFrequentlyUsed or LFU eviction policy. >> >> >> > b) Set the size to whatever number of fqs you find suitable. >> >> >> > You can do this like so: >> >> >> > <filterCache class="solr.LFUCache" size="100" initialSize="10" >> >> >> > autoWarmCount="10"/> >> >> >> > You should play around with these parameters to find the best >> >> combination >> >> >> > for your implementation. >> >> >> > For more details take a look here: >> >> >> > https://wiki.apache.org/solr/SolrCaching >> >> >> > http://yonik.com/advanced-filter-caching-in-solr/ >> >> >> > >> >> >> > >> >> >> > On Tue, Jan 5, 2016 at 7:28 PM Matteo Grolla < >> matteo.gro...@gmail.com >> >> > >> >> >> > wrote: >> >> >> > >> >> >> > > Hi, >> >> >> > > after looking at the presentation of cloudsearch from lucene >> >> >> > revolution >> >> >> > > 2014 >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> https://www.youtube.com/watch?v=RI1x0d-yO8A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP&index=49 >> >> >> > > min 17:08 >> >> >> > > >> >> >> > > I recognized I'd love to be able to remove the burden of >> disabling >> >> >> filter >> >> >> > > query caching from developers >> >> >> > > >> >> >> > > the problem: >> >> >> > > Solr by default caches filter queries >> >> >> > > a) When there are filter queries that are not reused and few that >> >> are >> >> >> the >> >> >> > > good ones get evicted unnecessarily >> >> >> > > b) if the same query has multiple filter queries that are very >> >> >> selective >> >> >> > I >> >> >> > > noticed a big performance disabling cache >> >> >> > > c) I'd like to spare developers from deciding what has to be >> cached >> >> or >> >> >> > not >> >> >> > > >> >> >> > > the question: >> >> >> > > -Is there anything already working to solve those problems? >> >> >> > > >> >> >> > > what do you think about this? >> >> >> > > -I was thinking to write a plugin to recognize query types with >> >> regular >> >> >> > > exception and let solr admins associate a caching behaviour with >> >> each >> >> >> > query >> >> >> > > type >> >> >> > > -another idea was to >> >> >> > > -by default set fq caching off >> >> >> > > -keep statistics about fq >> >> >> > > -enable caching only for the N fq with highest hit ratio >> >> >> > > >> >> >> > -- >> >> >> > Regards, >> >> >> > Binoy Dalal >> >> >> > >> >> >> >> >> > -- >> >> > Regards, >> >> > Binoy Dalal >> >> >>