Re: Facet Performance

James Bodkin Tue, 16 Jun 2020 07:48:18 -0700

I've been trying to build a query that I can use in newSearcher based off the 
information in your previous e-mail. I thought you meant to build a *:* query 
as per Query 1 in my previous e-mail but I'm still seeing the first-hit 
execution.
Now I'm wondering if you meant to create a *:* query with each of the fields as 
part of the fl query parameters or a *:* query with each of the fields and 
values as part of the fq query parameters.


At the moment I've been running these manually as I expected that I would see 
the first-execution penalty disappear by the time I got to query 4, as I 
thought this would replicate the actions of the newSeacher.
Unfortunately we can't use the autowarm count that is available as part of the 
filterCache/filterCache due to the custom deployment mechanism we use to update 
our index.

Kind Regards,

James Bodkin

On 16/06/2020, 15:30, "Erick Erickson" <[email protected]> wrote:

    Did you try the autowarming like I mentioned in my previous e-mail?

    > On Jun 16, 2020, at 10:18 AM, James Bodkin 
<[email protected]> wrote:
    > 
    > We've changed the schema to enable docValues for these fields and this 
led to an improvement in the response time. We found a further improvement by 
also switching off indexed as these fields are used for faceting and filtering 
only.
    > Since those changes, we've found that the first-execution for queries is 
really noticeable. I thought this would be the filterCache based on what I saw 
in NewRelic however it is probably trying to read the docValues from disk. How 
can we use the autowarming to improve this?
    > 
    > For example, I've run the following queries in sequence and each query 
has a first-execution penalty.
    > 
    > Query 1:
    > 
    > q=*:*
    > facet=true
    > facet.field=D_DepartureAirport
    > facet.field=D_Destination
    > facet.limit=-1
    > rows=0
    > 
    > Query 2:
    > 
    > q=*:*
    > fq=D_DepartureAirport:(2660) 
    > facet=true
    > facet.field=D_Destination
    > facet.limit=-1
    > rows=0
    > 
    > Query 3:
    > 
    > q=*:*
    > fq=D_DepartureAirport:(2661)
    > facet=true
    > facet.field=D_Destination
    > facet.limit=-1
    > rows=0
    > 
    > Query 4:
    > 
    > q=*:*
    > fq=D_DepartureAirport:(2660+OR+2661)
    > facet=true
    > facet.field=D_Destination
    > facet.limit=-1
    > rows=0
    > 
    > We've kept the field type as a string, as the value is mapped by 
application that accesses Solr. In the examples above, the values are mapped to 
airports and destinations.
    > Is it possible to prewarm the above queries without having to define all 
the potential filters manually in the auto warming?
    > 
    > At the moment, we update and optimise our index in a different 
environment and then copy the index to our production instances by using a 
rolling deployment in Kubernetes.
    > 
    > Kind Regards,
    > 
    > James Bodkin
    > 
    > On 12/06/2020, 18:58, "Erick Erickson" <[email protected]> wrote:
    > 
    >    I question whether fiterCache has anything to do with it, I suspect 
what’s really happening is that first time you’re reading the relevant bits 
from disk into memory. And to double check you should have docVaues enabled for 
all these fields. The “uninverting” process  can be very expensive, and 
docValues bypasses that.
    > 
    >    As of Solr 7.6, you can define “uninvertible=true” to your field(Type) 
to “fail fast” if Solr needs to uninvert the field.
    > 
    >    But that’s an aside. In either case, my claim is that first-time 
execution does “something”, either reads the serialized docValues from disk or 
uninverts the file on Solr’s heap.
    > 
    >    You can have this autowarmed by any combination of
    >    1> specifying an autowarm count on your queryResultCache. That’s hit 
or miss, as it replays the most recent N queries which may or may not contain 
the sorts. That said, specifying 10-20 for autowarm count is usually a good 
idea, assuming you’re not committing more than, say, every 30 seconds. I’d add 
the same to filterCache too.
    > 
    >    2> specifying a newSearcher or firstSearcher query in solrconfig.xml. 
The difference is that newSearcher is fired every time a commit happens, while 
firstSearcher is only fired when Solr starts, the theory being that there’s no 
cache autowarming available when Solr fist powers up. Usually, people don’t 
bother with firstSearcher or just make it the same as newSearcher. Note that a 
query doesn’t have to be “real” at all. You can just add all the facet fields 
to a *:* query in a single go.
    > 
    >    BTW, Trie fields will stay around for a long time even though 
deprecated. Or at least until we find something to replace them with that 
doesn’t have this penalty, so I’d feel pretty safe using those and they’ll be 
more efficient than strings.
    > 
    >    Best,
    >    Erick
    >

Re: Facet Performance

Reply via email to