Re: Facet Performance

Anthony Groves Wed, 17 Jun 2020 06:57:16 -0700

Ah, interesting! So if the number of possible values is low (like <= 10),
it is faster to *not *use docvalues on that (indexed) faceted field?
Does this hold true even when using faceting techniques like tag and
exclusion?


Thanks,
Anthony


On Wed, Jun 17, 2020 at 9:37 AM David Smiley <david.w.smi...@gmail.com>
wrote:

> I strongly recommend setting indexed=true on a field you facet on for the
> purposes of efficient refinement (fq=field:value).  But it strictly isn't
> required, as you have discovered.
>
> ~ David
>
>
> On Wed, Jun 17, 2020 at 9:02 AM Michael Gibney <mich...@michaelgibney.net>
> wrote:
>
> > facet.method=enum works by executing a query (against indexed values)
> > for each indexed value in a given field (which, for indexed=false, is
> > "no values"). So that explains why facet.method=enum no longer works.
> > I was going to suggest that you might not want to set indexed=false on
> > the docValues facet fields anyway, since the indexed values are still
> > used for facet refinement (assuming your index is distributed).
> >
> > What's the number of unique values in the relevant fields? If it's low
> > enough, setting docValues=false and indexed=true and using
> > facet.method=enum (with a sufficiently large filterCache) is
> > definitely a viable option, and will almost certainly be faster than
> > docValues-based faceting. (As an aside, noting for future reference:
> > high-cardinality facets over high-cardinality DocSet domains might be
> > able to benefit from a term facet count cache:
> > https://issues.apache.org/jira/browse/SOLR-13807)
> >
> > I think you didn't specifically mention whether you acted on Erick's
> > suggestion of setting "uninvertible=false" (I think Erick accidentally
> > said "uninvertible=true") to fail fast. I'd also recommend doing that,
> > perhaps even above all else -- it shouldn't actually *do* anything,
> > but will help ensure that things are behaving as you expect them to!
> >
> > Michael
> >
> > On Wed, Jun 17, 2020 at 4:31 AM James Bodkin
> > <james.bod...@loveholidays.com> wrote:
> > >
> > > Thanks, I've implemented some queries that improve the first-hit
> > execution for faceting.
> > >
> > > Since turning off indexed on those fields, we've noticed that
> > facet.method=enum no longer returns the facets when used.
> > > Using facet.method=fc/fcs is significantly slower compared to
> > facet.method=enum for us. Why do these two differences exist?
> > >
> > > On 16/06/2020, 17:52, "Erick Erickson" <erickerick...@gmail.com>
> wrote:
> > >
> > >     Ok, I see the disconnect... Necessary parts if the index are read
> > from disk
> > >     lazily. So your newSearcher or firstSearcher query needs to do
> > whatever
> > >     operation causes the relevant parts of the index to be read. In
> this
> > case,
> > >     probably just facet on all the fields you care about. I'd add
> > sorting too
> > >     if you sort on different fields.
> > >
> > >     The *:* query without facets or sorting does virtually nothing due
> > to some
> > >     special handling...
> > >
> > >     On Tue, Jun 16, 2020, 10:48 James Bodkin <
> > james.bod...@loveholidays.com>
> > >     wrote:
> > >
> > >     > I've been trying to build a query that I can use in newSearcher
> > based off
> > >     > the information in your previous e-mail. I thought you meant to
> > build a *:*
> > >     > query as per Query 1 in my previous e-mail but I'm still seeing
> the
> > >     > first-hit execution.
> > >     > Now I'm wondering if you meant to create a *:* query with each of
> > the
> > >     > fields as part of the fl query parameters or a *:* query with
> each
> > of the
> > >     > fields and values as part of the fq query parameters.
> > >     >
> > >     > At the moment I've been running these manually as I expected that
> > I would
> > >     > see the first-execution penalty disappear by the time I got to
> > query 4, as
> > >     > I thought this would replicate the actions of the newSeacher.
> > >     > Unfortunately we can't use the autowarm count that is available
> as
> > part of
> > >     > the filterCache/filterCache due to the custom deployment
> mechanism
> > we use
> > >     > to update our index.
> > >     >
> > >     > Kind Regards,
> > >     >
> > >     > James Bodkin
> > >     >
> > >     > On 16/06/2020, 15:30, "Erick Erickson" <erickerick...@gmail.com
> >
> > wrote:
> > >     >
> > >     >     Did you try the autowarming like I mentioned in my previous
> > e-mail?
> > >     >
> > >     >     > On Jun 16, 2020, at 10:18 AM, James Bodkin <
> > >     > james.bod...@loveholidays.com> wrote:
> > >     >     >
> > >     >     > We've changed the schema to enable docValues for these
> > fields and
> > >     > this led to an improvement in the response time. We found a
> further
> > >     > improvement by also switching off indexed as these fields are
> used
> > for
> > >     > faceting and filtering only.
> > >     >     > Since those changes, we've found that the first-execution
> for
> > >     > queries is really noticeable. I thought this would be the
> > filterCache based
> > >     > on what I saw in NewRelic however it is probably trying to read
> the
> > >     > docValues from disk. How can we use the autowarming to improve
> > this?
> > >     >     >
> > >     >     > For example, I've run the following queries in sequence and
> > each
> > >     > query has a first-execution penalty.
> > >     >     >
> > >     >     > Query 1:
> > >     >     >
> > >     >     > q=*:*
> > >     >     > facet=true
> > >     >     > facet.field=D_DepartureAirport
> > >     >     > facet.field=D_Destination
> > >     >     > facet.limit=-1
> > >     >     > rows=0
> > >     >     >
> > >     >     > Query 2:
> > >     >     >
> > >     >     > q=*:*
> > >     >     > fq=D_DepartureAirport:(2660)
> > >     >     > facet=true
> > >     >     > facet.field=D_Destination
> > >     >     > facet.limit=-1
> > >     >     > rows=0
> > >     >     >
> > >     >     > Query 3:
> > >     >     >
> > >     >     > q=*:*
> > >     >     > fq=D_DepartureAirport:(2661)
> > >     >     > facet=true
> > >     >     > facet.field=D_Destination
> > >     >     > facet.limit=-1
> > >     >     > rows=0
> > >     >     >
> > >     >     > Query 4:
> > >     >     >
> > >     >     > q=*:*
> > >     >     > fq=D_DepartureAirport:(2660+OR+2661)
> > >     >     > facet=true
> > >     >     > facet.field=D_Destination
> > >     >     > facet.limit=-1
> > >     >     > rows=0
> > >     >     >
> > >     >     > We've kept the field type as a string, as the value is
> > mapped by
> > >     > application that accesses Solr. In the examples above, the values
> > are
> > >     > mapped to airports and destinations.
> > >     >     > Is it possible to prewarm the above queries without having
> > to define
> > >     > all the potential filters manually in the auto warming?
> > >     >     >
> > >     >     > At the moment, we update and optimise our index in a
> > different
> > >     > environment and then copy the index to our production instances
> by
> > using a
> > >     > rolling deployment in Kubernetes.
> > >     >     >
> > >     >     > Kind Regards,
> > >     >     >
> > >     >     > James Bodkin
> > >     >     >
> > >     >     > On 12/06/2020, 18:58, "Erick Erickson" <
> > erickerick...@gmail.com>
> > >     > wrote:
> > >     >     >
> > >     >     >    I question whether fiterCache has anything to do with
> it,
> > I
> > >     > suspect what’s really happening is that first time you’re reading
> > the
> > >     > relevant bits from disk into memory. And to double check you
> > should have
> > >     > docVaues enabled for all these fields. The “uninverting” process
> > can be
> > >     > very expensive, and docValues bypasses that.
> > >     >     >
> > >     >     >    As of Solr 7.6, you can define “uninvertible=true” to
> your
> > >     > field(Type) to “fail fast” if Solr needs to uninvert the field.
> > >     >     >
> > >     >     >    But that’s an aside. In either case, my claim is that
> > first-time
> > >     > execution does “something”, either reads the serialized docValues
> > from disk
> > >     > or uninverts the file on Solr’s heap.
> > >     >     >
> > >     >     >    You can have this autowarmed by any combination of
> > >     >     >    1> specifying an autowarm count on your
> queryResultCache.
> > That’s
> > >     > hit or miss, as it replays the most recent N queries which may or
> > may not
> > >     > contain the sorts. That said, specifying 10-20 for autowarm count
> > is
> > >     > usually a good idea, assuming you’re not committing more than,
> > say, every
> > >     > 30 seconds. I’d add the same to filterCache too.
> > >     >     >
> > >     >     >    2> specifying a newSearcher or firstSearcher query in
> > >     > solrconfig.xml. The difference is that newSearcher is fired every
> > time a
> > >     > commit happens, while firstSearcher is only fired when Solr
> > starts, the
> > >     > theory being that there’s no cache autowarming available when
> Solr
> > fist
> > >     > powers up. Usually, people don’t bother with firstSearcher or
> just
> > make it
> > >     > the same as newSearcher. Note that a query doesn’t have to be
> > “real” at
> > >     > all. You can just add all the facet fields to a *:* query in a
> > single go.
> > >     >     >
> > >     >     >    BTW, Trie fields will stay around for a long time even
> > though
> > >     > deprecated. Or at least until we find something to replace them
> > with that
> > >     > doesn’t have this penalty, so I’d feel pretty safe using those
> and
> > they’ll
> > >     > be more efficient than strings.
> > >     >     >
> > >     >     >    Best,
> > >     >     >    Erick
> > >     >     >
> > >     >
> > >     >
> >
>

Re: Facet Performance

Reply via email to