I pretty much agree with your business side.

The rough size of the docValues fields is one of X for each doc. So
say you have an int field. Size is near maxDoc * 4 bytes. This is not
totally accurate, there is some int packing done for instance, but
it'll do. If you really want an accurate count, look at the
before/after size of your *.dvd, *.dvm segment files in your index.

However, it's "pay me now or pay me later". The critical operations
are faceting, grouping and sorting. If you do any of those operations
on a field that is _not_ docValues=true, it will be uninverted on the
_java heap_, where it will consume GC cycles, put pressure on all your
other operations, etc. This process will be done _every_ time you open
a new searcher and use these fields.

If the field _does_ have docValues=true, that will be held in the OS's
memory space, _not_ the JVM's heap due to using MMapDirectory (see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html).
Among other virtues, it can be swapped out (although you don't want it
to be, it's still better than OOMing). Plus loading it is just reading
it off disk rather than the expensive uninversion process.

And if you don't do any of those operations (grouping, sorting and
faceting), then the bits just sit there on disk doing nothing.

So say you carefully define what fields will be used for any of the
three operations and enable docValues. Then 3 months later the
business side comes back with "oh, we need to facet on another field".
Your choices are:
1> live with the increased heap usage and other resource contention.
Perhaps along the way panicking because your processes OOM and prod
goes down.
or
2> reindex from scratch, starting with a totally new collection.

And note the fragility here. Your application can be humming along
just fine for months. Then one fine day someone innocently submits a
query that sorts on a new field that has docValues=false and B-OOM.

If (and only if) you can _guarantee_ that fieldX will never be used
for any of the three operations, then turning off docValues for that
field will save you some disk space. But that's the only advantage.
Well, alright. If you have to do a full index replication that'll
happen a bit faster too.

So I prefer to err on the side of caution. I recommend making fields
docValues=true unless I can absolutely guarantee (and business _also_
agrees)
1>  that fieldX will never be used for sorting, grouping or faceting,
or
2> if the can't promise that they guarantee to give me time to
completely reindex,

Best,
Erick


On Wed, Jun 13, 2018 at 4:30 PM, root23 <s.manuj...@gmail.com> wrote:
> Hi all,
> Does anyone know how much typically index size increments when we enable doc
> value on a field.
> Our business side want to enable sorting fields on most of our fields. I am
> trying to push back saying that it will increase the index size, since
> enabling docvalues will create the univerted index.
>
> I know the size probably depends on what values are in the fields but i need
> a general idea so that i can convince them that enabling on the fields is
> costly and it will incur this much cost.
>
> If anyone knows how to find this out looking at an existing solr index which
> has docvalues enabled , that will  also be great help.
>
> Thanks !!!
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to