Ah, interesting! So if the number of possible values is low (like <= 10), it is faster to *not *use docvalues on that (indexed) faceted field? Does this hold true even when using faceting techniques like tag and exclusion?
Thanks, Anthony On Wed, Jun 17, 2020 at 9:37 AM David Smiley <david.w.smi...@gmail.com> wrote: > I strongly recommend setting indexed=true on a field you facet on for the > purposes of efficient refinement (fq=field:value). But it strictly isn't > required, as you have discovered. > > ~ David > > > On Wed, Jun 17, 2020 at 9:02 AM Michael Gibney <mich...@michaelgibney.net> > wrote: > > > facet.method=enum works by executing a query (against indexed values) > > for each indexed value in a given field (which, for indexed=false, is > > "no values"). So that explains why facet.method=enum no longer works. > > I was going to suggest that you might not want to set indexed=false on > > the docValues facet fields anyway, since the indexed values are still > > used for facet refinement (assuming your index is distributed). > > > > What's the number of unique values in the relevant fields? If it's low > > enough, setting docValues=false and indexed=true and using > > facet.method=enum (with a sufficiently large filterCache) is > > definitely a viable option, and will almost certainly be faster than > > docValues-based faceting. (As an aside, noting for future reference: > > high-cardinality facets over high-cardinality DocSet domains might be > > able to benefit from a term facet count cache: > > https://issues.apache.org/jira/browse/SOLR-13807) > > > > I think you didn't specifically mention whether you acted on Erick's > > suggestion of setting "uninvertible=false" (I think Erick accidentally > > said "uninvertible=true") to fail fast. I'd also recommend doing that, > > perhaps even above all else -- it shouldn't actually *do* anything, > > but will help ensure that things are behaving as you expect them to! > > > > Michael > > > > On Wed, Jun 17, 2020 at 4:31 AM James Bodkin > > <james.bod...@loveholidays.com> wrote: > > > > > > Thanks, I've implemented some queries that improve the first-hit > > execution for faceting. > > > > > > Since turning off indexed on those fields, we've noticed that > > facet.method=enum no longer returns the facets when used. > > > Using facet.method=fc/fcs is significantly slower compared to > > facet.method=enum for us. Why do these two differences exist? > > > > > > On 16/06/2020, 17:52, "Erick Erickson" <erickerick...@gmail.com> > wrote: > > > > > > Ok, I see the disconnect... Necessary parts if the index are read > > from disk > > > lazily. So your newSearcher or firstSearcher query needs to do > > whatever > > > operation causes the relevant parts of the index to be read. In > this > > case, > > > probably just facet on all the fields you care about. I'd add > > sorting too > > > if you sort on different fields. > > > > > > The *:* query without facets or sorting does virtually nothing due > > to some > > > special handling... > > > > > > On Tue, Jun 16, 2020, 10:48 James Bodkin < > > james.bod...@loveholidays.com> > > > wrote: > > > > > > > I've been trying to build a query that I can use in newSearcher > > based off > > > > the information in your previous e-mail. I thought you meant to > > build a *:* > > > > query as per Query 1 in my previous e-mail but I'm still seeing > the > > > > first-hit execution. > > > > Now I'm wondering if you meant to create a *:* query with each of > > the > > > > fields as part of the fl query parameters or a *:* query with > each > > of the > > > > fields and values as part of the fq query parameters. > > > > > > > > At the moment I've been running these manually as I expected that > > I would > > > > see the first-execution penalty disappear by the time I got to > > query 4, as > > > > I thought this would replicate the actions of the newSeacher. > > > > Unfortunately we can't use the autowarm count that is available > as > > part of > > > > the filterCache/filterCache due to the custom deployment > mechanism > > we use > > > > to update our index. > > > > > > > > Kind Regards, > > > > > > > > James Bodkin > > > > > > > > On 16/06/2020, 15:30, "Erick Erickson" <erickerick...@gmail.com > > > > wrote: > > > > > > > > Did you try the autowarming like I mentioned in my previous > > e-mail? > > > > > > > > > On Jun 16, 2020, at 10:18 AM, James Bodkin < > > > > james.bod...@loveholidays.com> wrote: > > > > > > > > > > We've changed the schema to enable docValues for these > > fields and > > > > this led to an improvement in the response time. We found a > further > > > > improvement by also switching off indexed as these fields are > used > > for > > > > faceting and filtering only. > > > > > Since those changes, we've found that the first-execution > for > > > > queries is really noticeable. I thought this would be the > > filterCache based > > > > on what I saw in NewRelic however it is probably trying to read > the > > > > docValues from disk. How can we use the autowarming to improve > > this? > > > > > > > > > > For example, I've run the following queries in sequence and > > each > > > > query has a first-execution penalty. > > > > > > > > > > Query 1: > > > > > > > > > > q=*:* > > > > > facet=true > > > > > facet.field=D_DepartureAirport > > > > > facet.field=D_Destination > > > > > facet.limit=-1 > > > > > rows=0 > > > > > > > > > > Query 2: > > > > > > > > > > q=*:* > > > > > fq=D_DepartureAirport:(2660) > > > > > facet=true > > > > > facet.field=D_Destination > > > > > facet.limit=-1 > > > > > rows=0 > > > > > > > > > > Query 3: > > > > > > > > > > q=*:* > > > > > fq=D_DepartureAirport:(2661) > > > > > facet=true > > > > > facet.field=D_Destination > > > > > facet.limit=-1 > > > > > rows=0 > > > > > > > > > > Query 4: > > > > > > > > > > q=*:* > > > > > fq=D_DepartureAirport:(2660+OR+2661) > > > > > facet=true > > > > > facet.field=D_Destination > > > > > facet.limit=-1 > > > > > rows=0 > > > > > > > > > > We've kept the field type as a string, as the value is > > mapped by > > > > application that accesses Solr. In the examples above, the values > > are > > > > mapped to airports and destinations. > > > > > Is it possible to prewarm the above queries without having > > to define > > > > all the potential filters manually in the auto warming? > > > > > > > > > > At the moment, we update and optimise our index in a > > different > > > > environment and then copy the index to our production instances > by > > using a > > > > rolling deployment in Kubernetes. > > > > > > > > > > Kind Regards, > > > > > > > > > > James Bodkin > > > > > > > > > > On 12/06/2020, 18:58, "Erick Erickson" < > > erickerick...@gmail.com> > > > > wrote: > > > > > > > > > > I question whether fiterCache has anything to do with > it, > > I > > > > suspect what’s really happening is that first time you’re reading > > the > > > > relevant bits from disk into memory. And to double check you > > should have > > > > docVaues enabled for all these fields. The “uninverting” process > > can be > > > > very expensive, and docValues bypasses that. > > > > > > > > > > As of Solr 7.6, you can define “uninvertible=true” to > your > > > > field(Type) to “fail fast” if Solr needs to uninvert the field. > > > > > > > > > > But that’s an aside. In either case, my claim is that > > first-time > > > > execution does “something”, either reads the serialized docValues > > from disk > > > > or uninverts the file on Solr’s heap. > > > > > > > > > > You can have this autowarmed by any combination of > > > > > 1> specifying an autowarm count on your > queryResultCache. > > That’s > > > > hit or miss, as it replays the most recent N queries which may or > > may not > > > > contain the sorts. That said, specifying 10-20 for autowarm count > > is > > > > usually a good idea, assuming you’re not committing more than, > > say, every > > > > 30 seconds. I’d add the same to filterCache too. > > > > > > > > > > 2> specifying a newSearcher or firstSearcher query in > > > > solrconfig.xml. The difference is that newSearcher is fired every > > time a > > > > commit happens, while firstSearcher is only fired when Solr > > starts, the > > > > theory being that there’s no cache autowarming available when > Solr > > fist > > > > powers up. Usually, people don’t bother with firstSearcher or > just > > make it > > > > the same as newSearcher. Note that a query doesn’t have to be > > “real” at > > > > all. You can just add all the facet fields to a *:* query in a > > single go. > > > > > > > > > > BTW, Trie fields will stay around for a long time even > > though > > > > deprecated. Or at least until we find something to replace them > > with that > > > > doesn’t have this penalty, so I’d feel pretty safe using those > and > > they’ll > > > > be more efficient than strings. > > > > > > > > > > Best, > > > > > Erick > > > > > > > > > > > > > > > >