Re: facets & docValues

Erick Erickson Thu, 16 Apr 2020 10:50:13 -0700

DocValues should help when faceting over fields, i.e. facet.field=blah.

I would expect docValues to help with sub facets and, but don’t know
the code well enough to say definitely one way or the other.

The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
turn docValues off. What that means is that if any operation tries to uninvert
the index on the Java heap, you’ll get an exception like:
"can not sort on a field w/o docValues unless it is indexed=true 
uninvertible=true and the type supports Uninversion:”

See SOLR-12962

Speed is only one issue. The entire point of docValues is to not “uninvert”
the field on the heap. This used to lead to very significant memory
pressure. So when turning docValues off, you run the risk of 
reverting back to the old behavior and having unexpected memory
consumption, not to mention slowdowns when the uninversion
takes place.

Also, unless your documents are very large, this is a tiny corpus. It can be
quite hard to get realistic numbers, the signal gets lost in the noise.

You should only shard when your individual query times exceed your
requirement. Say you have a 95%tile requirement of 1 second response time.

Let’s further say that you can meet that requirement with 50 queries/second,
but when you get to 75 queries/second your response time exceeds your 
requirements. Do NOT shard at this point. Add another replica instead.
Sharding adds inevitable overhead and should only be considered when
you can’t get adequate response time even under fairly light query loads
as a general rule.

Best,
Erick

> On Apr 16, 2020, at 12:08 PM, Revas <revas2...@gmail.com> wrote:
> 
> Hi Erick, You are correct, we have only about 1.8M documents so far and
> turning on the indexing on the facet fields helped improve the timings of
> the facet query a lot which has (sub facets and facet queries). So does
> docValues help at all for sub facets and facet query, our tests
> revealed further query time improvement when we turned off the docValues.
> is that the right approach?
> 
> Currently we have only 1 shard and  we are thinking of scaling by
> increasing the number of shards when we see a deterioration on query time.
> Any suggestions?
> 
> Thanks.
> 
> 
> On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> In a word, “yes”. I also suspect your corpus isn’t very big.
>> 
>> I think the key is the facet queries. Now, I’m talking from
>> theory rather than diving into the code, but querying on
>> a docValues=true, indexed=false field is really doing a
>> search. And searching on a field like that is effectively
>> analogous to a table scan. Even if somehow an internal
>> structure would be constructed to deal with it, it would
>> probably be on the heap, where you don’t want it.
>> 
>> So the test would be to take the queries out and measure
>> performance, but I think that’s the root issue here.
>> 
>> Best,
>> Erick
>> 
>>> On Apr 14, 2020, at 11:51 PM, Revas <revas2...@gmail.com> wrote:
>>> 
>>> We have faceting fields that have been defined as indexed=false,
>>> stored=false and docValues=true
>>> 
>>> However we use a lot of subfacets  using  json facets and facet ranges
>>> using facet.queries. We see that after every soft-commit our performance
>>> worsens and performs ideal between commits
>>> 
>>> how is that docValue fields are affected by soft-commit and do we need to
>>> enable indexing if we use subfacets and facet query to improve
>> performance?
>>> 
>>> Tha
>> 
>>

Re: facets & docValues

Reply via email to