Revas: Facet queries are just queries that are constrained by the total result set of your primary query, so the answer to that would be the same as speeding up regular queries. As far as range facets are concerned, I believe they _do_ use docValues, after all they have to answer the exact same question: For doc X in the result set, what is the value of field Y? The only difference is it has to bucket a bunch of them.
Rahul: Please don;’t hijack threads, it makes it difficult to find things later. Start a separate e-mail thread. The answer to your question is, of course, “it depends” on a number of things and changes with the query. First of all, multivalued fields don’t qualify because docValues are a sorted set, meaning the return is sorted and deduplicated. So if the input has f values in it, b c d c d, what you’d get back from DV is b c d. So let’s go with primitive, single-valued types. It still depends, but Solr does the right thing, or tries. Here’s the scoop. stored fields for any single doc are stored as a contiguous, compressed bit of memory. So if any _one_ field needs to be read from the stored data, the entire block is decompressed and Solr will preferentially fetch the value from the decompressed data as it’s pretty certain to be at least as cheap as fetching from DV. However, the reverse is true if _all_ the returned values are single-valued DV fields. Then it’s more efficient to fetch the DV values as they’re MMapped, and won’t cost the seek-and-decompress cycle. Unless space is a real consideration for you, I’d set both index and docValues to true… Best, Erick > On May 20, 2020, at 10:45 AM, Rahul Goswami <rahul196...@gmail.com> wrote: > > Eric, > Thanks for that explanation. I have a follow up question on that. I find > the scenario of stored=true and docValues=true to be tricky at times... > would like to know when is each of these scenarios preferred over the other > two for primitive datatypes: > > 1) stored=true and docValues=false > 2) stored=false and docValues=true > 3) stored=true and docValues=true > > Thanks, > Rahul > > On Tue, May 19, 2020 at 5:55 PM Erick Erickson <erickerick...@gmail.com> > wrote: > >> They are _absolutely_ able to be used together. Background: >> >> “In the bad old days”, there was no docValues. So whenever you needed >> to facet/sort/group/use function queries Solr (well, Lucene) had to take >> the inverted structure resulting from “index=true” and “uninvert” it on the >> Java heap. >> >> docValues essentially does the “uninverting” at index time and puts >> that structure in a separate file for each segment. So rather than uninvert >> the index on the heap, Lucene can just read it in from disk in >> MMapDirectory >> (i.e. OS) memory space. >> >> The downside is that your index will be bigger when you do both, that is >> the >> size on disk will be bigger. But, it’ll be much faster to load, much >> faster to >> autowarm, and will move the structures necessary to do faceting/sorting/etc >> into OS memory where the garbage collection is vastly more efficient than >> Javas. >> >> And frankly I don’t think the increased size on disk is a downside. You’ll >> have >> to have the memory anyway, and having it used on the OS memory space is >> so much more efficient than on Java’s heap that it’s a win-win IMO. >> >> Oh, and if you never sort/facet/group/use function queries, then the >> docValues structures are never even read into MMapDirectory space. >> >> So yes, freely do both. >> >> Best, >> Erick >> >> >>> On May 19, 2020, at 5:41 PM, matthew sporleder <msporle...@gmail.com> >> wrote: >>> >>> You can index AND docvalue? For some reason I thought they were >> exclusive >>> >>> On Tue, May 19, 2020 at 5:36 PM Erick Erickson <erickerick...@gmail.com> >> wrote: >>>> >>>> Yes. You should also index them…. >>>> >>>> Here’s the way I think of it. >>>> >>>> For questions “For term X, which docs contain that value?” means >> index=true. This is a search. >>>> >>>> For questions “Does doc X have value Y in field Z”, means >> docValues=true. >>>> >>>> what’s the difference? Well, the first one is to get the result set. >> The second is for, given a result set, >>>> count/sort/whatever. >>>> >>>> fq clauses are searches, so index=true. >>>> >>>> sorting, faceting, grouping and function queries are “for each doc in >> the result set, what values does field Y contain?” >>>> >>>> Maybe that made things clear as mud, but it’s the way I think of it ;) >>>> >>>> Best, >>>> Erick >>>> >>>> >>>> >>>> fq clauses are searches. Indexed=true is for searching. >>>> >>>> sort >>>> >>>>> On May 19, 2020, at 4:00 PM, matthew sporleder <msporle...@gmail.com> >> wrote: >>>>> >>>>> I have quite a few numeric / meta-data type fields in my schema and >>>>> pretty much only use them in fq=, sort=, and friends. Should I always >>>>> use DocValue on these if i never plan to q=search: on them? Are there >>>>> any drawbacks? >>>>> >>>>> Thanks, >>>>> Matt >>>> >> >>