Re: Facet Performance

David Smiley Wed, 17 Jun 2020 06:38:18 -0700

I strongly recommend setting indexed=true on a field you facet on for the
purposes of efficient refinement (fq=field:value).  But it strictly isn't
required, as you have discovered.


~ David


On Wed, Jun 17, 2020 at 9:02 AM Michael Gibney <[email protected]>
wrote:

> facet.method=enum works by executing a query (against indexed values)
> for each indexed value in a given field (which, for indexed=false, is
> "no values"). So that explains why facet.method=enum no longer works.
> I was going to suggest that you might not want to set indexed=false on
> the docValues facet fields anyway, since the indexed values are still
> used for facet refinement (assuming your index is distributed).
>
> What's the number of unique values in the relevant fields? If it's low
> enough, setting docValues=false and indexed=true and using
> facet.method=enum (with a sufficiently large filterCache) is
> definitely a viable option, and will almost certainly be faster than
> docValues-based faceting. (As an aside, noting for future reference:
> high-cardinality facets over high-cardinality DocSet domains might be
> able to benefit from a term facet count cache:
> https://issues.apache.org/jira/browse/SOLR-13807)
>
> I think you didn't specifically mention whether you acted on Erick's
> suggestion of setting "uninvertible=false" (I think Erick accidentally
> said "uninvertible=true") to fail fast. I'd also recommend doing that,
> perhaps even above all else -- it shouldn't actually *do* anything,
> but will help ensure that things are behaving as you expect them to!
>
> Michael
>
> On Wed, Jun 17, 2020 at 4:31 AM James Bodkin
> <[email protected]> wrote:
> >
> > Thanks, I've implemented some queries that improve the first-hit
> execution for faceting.
> >
> > Since turning off indexed on those fields, we've noticed that
> facet.method=enum no longer returns the facets when used.
> > Using facet.method=fc/fcs is significantly slower compared to
> facet.method=enum for us. Why do these two differences exist?
> >
> > On 16/06/2020, 17:52, "Erick Erickson" <[email protected]> wrote:
> >
> >     Ok, I see the disconnect... Necessary parts if the index are read
> from disk
> >     lazily. So your newSearcher or firstSearcher query needs to do
> whatever
> >     operation causes the relevant parts of the index to be read. In this
> case,
> >     probably just facet on all the fields you care about. I'd add
> sorting too
> >     if you sort on different fields.
> >
> >     The *:* query without facets or sorting does virtually nothing due
> to some
> >     special handling...
> >
> >     On Tue, Jun 16, 2020, 10:48 James Bodkin <
> [email protected]>
> >     wrote:
> >
> >     > I've been trying to build a query that I can use in newSearcher
> based off
> >     > the information in your previous e-mail. I thought you meant to
> build a *:*
> >     > query as per Query 1 in my previous e-mail but I'm still seeing the
> >     > first-hit execution.
> >     > Now I'm wondering if you meant to create a *:* query with each of
> the
> >     > fields as part of the fl query parameters or a *:* query with each
> of the
> >     > fields and values as part of the fq query parameters.
> >     >
> >     > At the moment I've been running these manually as I expected that
> I would
> >     > see the first-execution penalty disappear by the time I got to
> query 4, as
> >     > I thought this would replicate the actions of the newSeacher.
> >     > Unfortunately we can't use the autowarm count that is available as
> part of
> >     > the filterCache/filterCache due to the custom deployment mechanism
> we use
> >     > to update our index.
> >     >
> >     > Kind Regards,
> >     >
> >     > James Bodkin
> >     >
> >     > On 16/06/2020, 15:30, "Erick Erickson" <[email protected]>
> wrote:
> >     >
> >     >     Did you try the autowarming like I mentioned in my previous
> e-mail?
> >     >
> >     >     > On Jun 16, 2020, at 10:18 AM, James Bodkin <
> >     > [email protected]> wrote:
> >     >     >
> >     >     > We've changed the schema to enable docValues for these
> fields and
> >     > this led to an improvement in the response time. We found a further
> >     > improvement by also switching off indexed as these fields are used
> for
> >     > faceting and filtering only.
> >     >     > Since those changes, we've found that the first-execution for
> >     > queries is really noticeable. I thought this would be the
> filterCache based
> >     > on what I saw in NewRelic however it is probably trying to read the
> >     > docValues from disk. How can we use the autowarming to improve
> this?
> >     >     >
> >     >     > For example, I've run the following queries in sequence and
> each
> >     > query has a first-execution penalty.
> >     >     >
> >     >     > Query 1:
> >     >     >
> >     >     > q=*:*
> >     >     > facet=true
> >     >     > facet.field=D_DepartureAirport
> >     >     > facet.field=D_Destination
> >     >     > facet.limit=-1
> >     >     > rows=0
> >     >     >
> >     >     > Query 2:
> >     >     >
> >     >     > q=*:*
> >     >     > fq=D_DepartureAirport:(2660)
> >     >     > facet=true
> >     >     > facet.field=D_Destination
> >     >     > facet.limit=-1
> >     >     > rows=0
> >     >     >
> >     >     > Query 3:
> >     >     >
> >     >     > q=*:*
> >     >     > fq=D_DepartureAirport:(2661)
> >     >     > facet=true
> >     >     > facet.field=D_Destination
> >     >     > facet.limit=-1
> >     >     > rows=0
> >     >     >
> >     >     > Query 4:
> >     >     >
> >     >     > q=*:*
> >     >     > fq=D_DepartureAirport:(2660+OR+2661)
> >     >     > facet=true
> >     >     > facet.field=D_Destination
> >     >     > facet.limit=-1
> >     >     > rows=0
> >     >     >
> >     >     > We've kept the field type as a string, as the value is
> mapped by
> >     > application that accesses Solr. In the examples above, the values
> are
> >     > mapped to airports and destinations.
> >     >     > Is it possible to prewarm the above queries without having
> to define
> >     > all the potential filters manually in the auto warming?
> >     >     >
> >     >     > At the moment, we update and optimise our index in a
> different
> >     > environment and then copy the index to our production instances by
> using a
> >     > rolling deployment in Kubernetes.
> >     >     >
> >     >     > Kind Regards,
> >     >     >
> >     >     > James Bodkin
> >     >     >
> >     >     > On 12/06/2020, 18:58, "Erick Erickson" <
> [email protected]>
> >     > wrote:
> >     >     >
> >     >     >    I question whether fiterCache has anything to do with it,
> I
> >     > suspect what’s really happening is that first time you’re reading
> the
> >     > relevant bits from disk into memory. And to double check you
> should have
> >     > docVaues enabled for all these fields. The “uninverting” process
> can be
> >     > very expensive, and docValues bypasses that.
> >     >     >
> >     >     >    As of Solr 7.6, you can define “uninvertible=true” to your
> >     > field(Type) to “fail fast” if Solr needs to uninvert the field.
> >     >     >
> >     >     >    But that’s an aside. In either case, my claim is that
> first-time
> >     > execution does “something”, either reads the serialized docValues
> from disk
> >     > or uninverts the file on Solr’s heap.
> >     >     >
> >     >     >    You can have this autowarmed by any combination of
> >     >     >    1> specifying an autowarm count on your queryResultCache.
> That’s
> >     > hit or miss, as it replays the most recent N queries which may or
> may not
> >     > contain the sorts. That said, specifying 10-20 for autowarm count
> is
> >     > usually a good idea, assuming you’re not committing more than,
> say, every
> >     > 30 seconds. I’d add the same to filterCache too.
> >     >     >
> >     >     >    2> specifying a newSearcher or firstSearcher query in
> >     > solrconfig.xml. The difference is that newSearcher is fired every
> time a
> >     > commit happens, while firstSearcher is only fired when Solr
> starts, the
> >     > theory being that there’s no cache autowarming available when Solr
> fist
> >     > powers up. Usually, people don’t bother with firstSearcher or just
> make it
> >     > the same as newSearcher. Note that a query doesn’t have to be
> “real” at
> >     > all. You can just add all the facet fields to a *:* query in a
> single go.
> >     >     >
> >     >     >    BTW, Trie fields will stay around for a long time even
> though
> >     > deprecated. Or at least until we find something to replace them
> with that
> >     > doesn’t have this penalty, so I’d feel pretty safe using those and
> they’ll
> >     > be more efficient than strings.
> >     >     >
> >     >     >    Best,
> >     >     >    Erick
> >     >     >
> >     >
> >     >
>

Re: Facet Performance

Reply via email to