Depending on what your documents look like, it could be that enabling docValues 
would allow you to save space by switching to stored="false" since Solr can 
fetch the stored value from docValues. I say it depends on your documents and 
use case since sometimes it may be slower to access a docValue just to read one 
field if all the other fields come from stored values. If you do not do 
matches/lookups/range-queries on some fields you may even be able to set 
indexed="false" and save space in the inverted index.

A benefit of having docValues enabled is that it then lets you do atomic 
updates to your docs, to re-index from an existing index (not from source) and 
to use streaming expressions on all fields.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 14. jun. 2018 kl. 04:13 skrev Erick Erickson <erickerick...@gmail.com>:
> 
> I pretty much agree with your business side.
> 
> The rough size of the docValues fields is one of X for each doc. So
> say you have an int field. Size is near maxDoc * 4 bytes. This is not
> totally accurate, there is some int packing done for instance, but
> it'll do. If you really want an accurate count, look at the
> before/after size of your *.dvd, *.dvm segment files in your index.
> 
> However, it's "pay me now or pay me later". The critical operations
> are faceting, grouping and sorting. If you do any of those operations
> on a field that is _not_ docValues=true, it will be uninverted on the
> _java heap_, where it will consume GC cycles, put pressure on all your
> other operations, etc. This process will be done _every_ time you open
> a new searcher and use these fields.
> 
> If the field _does_ have docValues=true, that will be held in the OS's
> memory space, _not_ the JVM's heap due to using MMapDirectory (see:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html).
> Among other virtues, it can be swapped out (although you don't want it
> to be, it's still better than OOMing). Plus loading it is just reading
> it off disk rather than the expensive uninversion process.
> 
> And if you don't do any of those operations (grouping, sorting and
> faceting), then the bits just sit there on disk doing nothing.
> 
> So say you carefully define what fields will be used for any of the
> three operations and enable docValues. Then 3 months later the
> business side comes back with "oh, we need to facet on another field".
> Your choices are:
> 1> live with the increased heap usage and other resource contention.
> Perhaps along the way panicking because your processes OOM and prod
> goes down.
> or
> 2> reindex from scratch, starting with a totally new collection.
> 
> And note the fragility here. Your application can be humming along
> just fine for months. Then one fine day someone innocently submits a
> query that sorts on a new field that has docValues=false and B-OOM.
> 
> If (and only if) you can _guarantee_ that fieldX will never be used
> for any of the three operations, then turning off docValues for that
> field will save you some disk space. But that's the only advantage.
> Well, alright. If you have to do a full index replication that'll
> happen a bit faster too.
> 
> So I prefer to err on the side of caution. I recommend making fields
> docValues=true unless I can absolutely guarantee (and business _also_
> agrees)
> 1>  that fieldX will never be used for sorting, grouping or faceting,
> or
> 2> if the can't promise that they guarantee to give me time to
> completely reindex,
> 
> Best,
> Erick
> 
> 
> On Wed, Jun 13, 2018 at 4:30 PM, root23 <s.manuj...@gmail.com> wrote:
>> Hi all,
>> Does anyone know how much typically index size increments when we enable doc
>> value on a field.
>> Our business side want to enable sorting fields on most of our fields. I am
>> trying to push back saying that it will increase the index size, since
>> enabling docvalues will create the univerted index.
>> 
>> I know the size probably depends on what values are in the fields but i need
>> a general idea so that i can convince them that enabling on the fields is
>> costly and it will incur this much cost.
>> 
>> If anyone knows how to find this out looking at an existing solr index which
>> has docvalues enabled , that will  also be great help.
>> 
>> Thanks !!!
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to