I pretty much agree with your business side. The rough size of the docValues fields is one of X for each doc. So say you have an int field. Size is near maxDoc * 4 bytes. This is not totally accurate, there is some int packing done for instance, but it'll do. If you really want an accurate count, look at the before/after size of your *.dvd, *.dvm segment files in your index.
However, it's "pay me now or pay me later". The critical operations are faceting, grouping and sorting. If you do any of those operations on a field that is _not_ docValues=true, it will be uninverted on the _java heap_, where it will consume GC cycles, put pressure on all your other operations, etc. This process will be done _every_ time you open a new searcher and use these fields. If the field _does_ have docValues=true, that will be held in the OS's memory space, _not_ the JVM's heap due to using MMapDirectory (see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html). Among other virtues, it can be swapped out (although you don't want it to be, it's still better than OOMing). Plus loading it is just reading it off disk rather than the expensive uninversion process. And if you don't do any of those operations (grouping, sorting and faceting), then the bits just sit there on disk doing nothing. So say you carefully define what fields will be used for any of the three operations and enable docValues. Then 3 months later the business side comes back with "oh, we need to facet on another field". Your choices are: 1> live with the increased heap usage and other resource contention. Perhaps along the way panicking because your processes OOM and prod goes down. or 2> reindex from scratch, starting with a totally new collection. And note the fragility here. Your application can be humming along just fine for months. Then one fine day someone innocently submits a query that sorts on a new field that has docValues=false and B-OOM. If (and only if) you can _guarantee_ that fieldX will never be used for any of the three operations, then turning off docValues for that field will save you some disk space. But that's the only advantage. Well, alright. If you have to do a full index replication that'll happen a bit faster too. So I prefer to err on the side of caution. I recommend making fields docValues=true unless I can absolutely guarantee (and business _also_ agrees) 1> that fieldX will never be used for sorting, grouping or faceting, or 2> if the can't promise that they guarantee to give me time to completely reindex, Best, Erick On Wed, Jun 13, 2018 at 4:30 PM, root23 <s.manuj...@gmail.com> wrote: > Hi all, > Does anyone know how much typically index size increments when we enable doc > value on a field. > Our business side want to enable sorting fields on most of our fields. I am > trying to push back saying that it will increase the index size, since > enabling docvalues will create the univerted index. > > I know the size probably depends on what values are in the fields but i need > a general idea so that i can convince them that enabling on the fields is > costly and it will incur this much cost. > > If anyone knows how to find this out looking at an existing solr index which > has docvalues enabled , that will also be great help. > > Thanks !!! > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html