I strongly recommend setting indexed=true on a field you facet on for the purposes of efficient refinement (fq=field:value). But it strictly isn't required, as you have discovered.
~ David On Wed, Jun 17, 2020 at 9:02 AM Michael Gibney <mich...@michaelgibney.net> wrote: > facet.method=enum works by executing a query (against indexed values) > for each indexed value in a given field (which, for indexed=false, is > "no values"). So that explains why facet.method=enum no longer works. > I was going to suggest that you might not want to set indexed=false on > the docValues facet fields anyway, since the indexed values are still > used for facet refinement (assuming your index is distributed). > > What's the number of unique values in the relevant fields? If it's low > enough, setting docValues=false and indexed=true and using > facet.method=enum (with a sufficiently large filterCache) is > definitely a viable option, and will almost certainly be faster than > docValues-based faceting. (As an aside, noting for future reference: > high-cardinality facets over high-cardinality DocSet domains might be > able to benefit from a term facet count cache: > https://issues.apache.org/jira/browse/SOLR-13807) > > I think you didn't specifically mention whether you acted on Erick's > suggestion of setting "uninvertible=false" (I think Erick accidentally > said "uninvertible=true") to fail fast. I'd also recommend doing that, > perhaps even above all else -- it shouldn't actually *do* anything, > but will help ensure that things are behaving as you expect them to! > > Michael > > On Wed, Jun 17, 2020 at 4:31 AM James Bodkin > <james.bod...@loveholidays.com> wrote: > > > > Thanks, I've implemented some queries that improve the first-hit > execution for faceting. > > > > Since turning off indexed on those fields, we've noticed that > facet.method=enum no longer returns the facets when used. > > Using facet.method=fc/fcs is significantly slower compared to > facet.method=enum for us. Why do these two differences exist? > > > > On 16/06/2020, 17:52, "Erick Erickson" <erickerick...@gmail.com> wrote: > > > > Ok, I see the disconnect... Necessary parts if the index are read > from disk > > lazily. So your newSearcher or firstSearcher query needs to do > whatever > > operation causes the relevant parts of the index to be read. In this > case, > > probably just facet on all the fields you care about. I'd add > sorting too > > if you sort on different fields. > > > > The *:* query without facets or sorting does virtually nothing due > to some > > special handling... > > > > On Tue, Jun 16, 2020, 10:48 James Bodkin < > james.bod...@loveholidays.com> > > wrote: > > > > > I've been trying to build a query that I can use in newSearcher > based off > > > the information in your previous e-mail. I thought you meant to > build a *:* > > > query as per Query 1 in my previous e-mail but I'm still seeing the > > > first-hit execution. > > > Now I'm wondering if you meant to create a *:* query with each of > the > > > fields as part of the fl query parameters or a *:* query with each > of the > > > fields and values as part of the fq query parameters. > > > > > > At the moment I've been running these manually as I expected that > I would > > > see the first-execution penalty disappear by the time I got to > query 4, as > > > I thought this would replicate the actions of the newSeacher. > > > Unfortunately we can't use the autowarm count that is available as > part of > > > the filterCache/filterCache due to the custom deployment mechanism > we use > > > to update our index. > > > > > > Kind Regards, > > > > > > James Bodkin > > > > > > On 16/06/2020, 15:30, "Erick Erickson" <erickerick...@gmail.com> > wrote: > > > > > > Did you try the autowarming like I mentioned in my previous > e-mail? > > > > > > > On Jun 16, 2020, at 10:18 AM, James Bodkin < > > > james.bod...@loveholidays.com> wrote: > > > > > > > > We've changed the schema to enable docValues for these > fields and > > > this led to an improvement in the response time. We found a further > > > improvement by also switching off indexed as these fields are used > for > > > faceting and filtering only. > > > > Since those changes, we've found that the first-execution for > > > queries is really noticeable. I thought this would be the > filterCache based > > > on what I saw in NewRelic however it is probably trying to read the > > > docValues from disk. How can we use the autowarming to improve > this? > > > > > > > > For example, I've run the following queries in sequence and > each > > > query has a first-execution penalty. > > > > > > > > Query 1: > > > > > > > > q=*:* > > > > facet=true > > > > facet.field=D_DepartureAirport > > > > facet.field=D_Destination > > > > facet.limit=-1 > > > > rows=0 > > > > > > > > Query 2: > > > > > > > > q=*:* > > > > fq=D_DepartureAirport:(2660) > > > > facet=true > > > > facet.field=D_Destination > > > > facet.limit=-1 > > > > rows=0 > > > > > > > > Query 3: > > > > > > > > q=*:* > > > > fq=D_DepartureAirport:(2661) > > > > facet=true > > > > facet.field=D_Destination > > > > facet.limit=-1 > > > > rows=0 > > > > > > > > Query 4: > > > > > > > > q=*:* > > > > fq=D_DepartureAirport:(2660+OR+2661) > > > > facet=true > > > > facet.field=D_Destination > > > > facet.limit=-1 > > > > rows=0 > > > > > > > > We've kept the field type as a string, as the value is > mapped by > > > application that accesses Solr. In the examples above, the values > are > > > mapped to airports and destinations. > > > > Is it possible to prewarm the above queries without having > to define > > > all the potential filters manually in the auto warming? > > > > > > > > At the moment, we update and optimise our index in a > different > > > environment and then copy the index to our production instances by > using a > > > rolling deployment in Kubernetes. > > > > > > > > Kind Regards, > > > > > > > > James Bodkin > > > > > > > > On 12/06/2020, 18:58, "Erick Erickson" < > erickerick...@gmail.com> > > > wrote: > > > > > > > > I question whether fiterCache has anything to do with it, > I > > > suspect what’s really happening is that first time you’re reading > the > > > relevant bits from disk into memory. And to double check you > should have > > > docVaues enabled for all these fields. The “uninverting” process > can be > > > very expensive, and docValues bypasses that. > > > > > > > > As of Solr 7.6, you can define “uninvertible=true” to your > > > field(Type) to “fail fast” if Solr needs to uninvert the field. > > > > > > > > But that’s an aside. In either case, my claim is that > first-time > > > execution does “something”, either reads the serialized docValues > from disk > > > or uninverts the file on Solr’s heap. > > > > > > > > You can have this autowarmed by any combination of > > > > 1> specifying an autowarm count on your queryResultCache. > That’s > > > hit or miss, as it replays the most recent N queries which may or > may not > > > contain the sorts. That said, specifying 10-20 for autowarm count > is > > > usually a good idea, assuming you’re not committing more than, > say, every > > > 30 seconds. I’d add the same to filterCache too. > > > > > > > > 2> specifying a newSearcher or firstSearcher query in > > > solrconfig.xml. The difference is that newSearcher is fired every > time a > > > commit happens, while firstSearcher is only fired when Solr > starts, the > > > theory being that there’s no cache autowarming available when Solr > fist > > > powers up. Usually, people don’t bother with firstSearcher or just > make it > > > the same as newSearcher. Note that a query doesn’t have to be > “real” at > > > all. You can just add all the facet fields to a *:* query in a > single go. > > > > > > > > BTW, Trie fields will stay around for a long time even > though > > > deprecated. Or at least until we find something to replace them > with that > > > doesn’t have this penalty, so I’d feel pretty safe using those and > they’ll > > > be more efficient than strings. > > > > > > > > Best, > > > > Erick > > > > > > > > > > >