Depending on what your documents look like, it could be that enabling docValues would allow you to save space by switching to stored="false" since Solr can fetch the stored value from docValues. I say it depends on your documents and use case since sometimes it may be slower to access a docValue just to read one field if all the other fields come from stored values. If you do not do matches/lookups/range-queries on some fields you may even be able to set indexed="false" and save space in the inverted index.
A benefit of having docValues enabled is that it then lets you do atomic updates to your docs, to re-index from an existing index (not from source) and to use streaming expressions on all fields. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 14. jun. 2018 kl. 04:13 skrev Erick Erickson <erickerick...@gmail.com>: > > I pretty much agree with your business side. > > The rough size of the docValues fields is one of X for each doc. So > say you have an int field. Size is near maxDoc * 4 bytes. This is not > totally accurate, there is some int packing done for instance, but > it'll do. If you really want an accurate count, look at the > before/after size of your *.dvd, *.dvm segment files in your index. > > However, it's "pay me now or pay me later". The critical operations > are faceting, grouping and sorting. If you do any of those operations > on a field that is _not_ docValues=true, it will be uninverted on the > _java heap_, where it will consume GC cycles, put pressure on all your > other operations, etc. This process will be done _every_ time you open > a new searcher and use these fields. > > If the field _does_ have docValues=true, that will be held in the OS's > memory space, _not_ the JVM's heap due to using MMapDirectory (see: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html). > Among other virtues, it can be swapped out (although you don't want it > to be, it's still better than OOMing). Plus loading it is just reading > it off disk rather than the expensive uninversion process. > > And if you don't do any of those operations (grouping, sorting and > faceting), then the bits just sit there on disk doing nothing. > > So say you carefully define what fields will be used for any of the > three operations and enable docValues. Then 3 months later the > business side comes back with "oh, we need to facet on another field". > Your choices are: > 1> live with the increased heap usage and other resource contention. > Perhaps along the way panicking because your processes OOM and prod > goes down. > or > 2> reindex from scratch, starting with a totally new collection. > > And note the fragility here. Your application can be humming along > just fine for months. Then one fine day someone innocently submits a > query that sorts on a new field that has docValues=false and B-OOM. > > If (and only if) you can _guarantee_ that fieldX will never be used > for any of the three operations, then turning off docValues for that > field will save you some disk space. But that's the only advantage. > Well, alright. If you have to do a full index replication that'll > happen a bit faster too. > > So I prefer to err on the side of caution. I recommend making fields > docValues=true unless I can absolutely guarantee (and business _also_ > agrees) > 1> that fieldX will never be used for sorting, grouping or faceting, > or > 2> if the can't promise that they guarantee to give me time to > completely reindex, > > Best, > Erick > > > On Wed, Jun 13, 2018 at 4:30 PM, root23 <s.manuj...@gmail.com> wrote: >> Hi all, >> Does anyone know how much typically index size increments when we enable doc >> value on a field. >> Our business side want to enable sorting fields on most of our fields. I am >> trying to push back saying that it will increase the index size, since >> enabling docvalues will create the univerted index. >> >> I know the size probably depends on what values are in the fields but i need >> a general idea so that i can convince them that enabling on the fields is >> costly and it will incur this much cost. >> >> If anyone knows how to find this out looking at an existing solr index which >> has docvalues enabled , that will also be great help. >> >> Thanks !!! >> >> >> >> -- >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html