Eric, Thanks for that explanation. I have a follow up question on that. I find the scenario of stored=true and docValues=true to be tricky at times... would like to know when is each of these scenarios preferred over the other two for primitive datatypes:
1) stored=true and docValues=false 2) stored=false and docValues=true 3) stored=true and docValues=true Thanks, Rahul On Tue, May 19, 2020 at 5:55 PM Erick Erickson <erickerick...@gmail.com> wrote: > They are _absolutely_ able to be used together. Background: > > “In the bad old days”, there was no docValues. So whenever you needed > to facet/sort/group/use function queries Solr (well, Lucene) had to take > the inverted structure resulting from “index=true” and “uninvert” it on the > Java heap. > > docValues essentially does the “uninverting” at index time and puts > that structure in a separate file for each segment. So rather than uninvert > the index on the heap, Lucene can just read it in from disk in > MMapDirectory > (i.e. OS) memory space. > > The downside is that your index will be bigger when you do both, that is > the > size on disk will be bigger. But, it’ll be much faster to load, much > faster to > autowarm, and will move the structures necessary to do faceting/sorting/etc > into OS memory where the garbage collection is vastly more efficient than > Javas. > > And frankly I don’t think the increased size on disk is a downside. You’ll > have > to have the memory anyway, and having it used on the OS memory space is > so much more efficient than on Java’s heap that it’s a win-win IMO. > > Oh, and if you never sort/facet/group/use function queries, then the > docValues structures are never even read into MMapDirectory space. > > So yes, freely do both. > > Best, > Erick > > > > On May 19, 2020, at 5:41 PM, matthew sporleder <msporle...@gmail.com> > wrote: > > > > You can index AND docvalue? For some reason I thought they were > exclusive > > > > On Tue, May 19, 2020 at 5:36 PM Erick Erickson <erickerick...@gmail.com> > wrote: > >> > >> Yes. You should also index them…. > >> > >> Here’s the way I think of it. > >> > >> For questions “For term X, which docs contain that value?” means > index=true. This is a search. > >> > >> For questions “Does doc X have value Y in field Z”, means > docValues=true. > >> > >> what’s the difference? Well, the first one is to get the result set. > The second is for, given a result set, > >> count/sort/whatever. > >> > >> fq clauses are searches, so index=true. > >> > >> sorting, faceting, grouping and function queries are “for each doc in > the result set, what values does field Y contain?” > >> > >> Maybe that made things clear as mud, but it’s the way I think of it ;) > >> > >> Best, > >> Erick > >> > >> > >> > >> fq clauses are searches. Indexed=true is for searching. > >> > >> sort > >> > >>> On May 19, 2020, at 4:00 PM, matthew sporleder <msporle...@gmail.com> > wrote: > >>> > >>> I have quite a few numeric / meta-data type fields in my schema and > >>> pretty much only use them in fq=, sort=, and friends. Should I always > >>> use DocValue on these if i never plan to q=search: on them? Are there > >>> any drawbacks? > >>> > >>> Thanks, > >>> Matt > >> > >