Pratik may have jumped right to the difference. We'd have gotten there eventually by looking at file extensions, but just checking his recommendation would be the first thing to do!
bq: what would be the right scenarios to use docvalues='true'? Whenever you want to facet, group or sort on the field. This _will_ increase the index size on disk, but it's almost always a good tradeoff, here's why: To facet, group or sort you need to "uninvert" the field. If you have docValues=false, this universion is done at run-time into Java's heap. If you have docValues=true, the uninversion is done at _index_ time and the result stored on disk. Now when it's required, it can be loaded in from disk efficiently (essentially de-serialized) and is stored on the OS memory due to the magic of MMapDirectory, see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html bq: In what situation would it make sense to have indexed=false and docValues=true? When you want to return _only_ fields that have docValues=true. If you return fields with stored=true and docValues=false, Solr/Lucene has to 1> read the stored values from disk (minimum 16K block) 2> decrypt it 3> extract the field With docValues, since they're only simple field types, all that you have to do is read the value from the docValues structure., much more efficient. HOWEVER, there are two caveats 1> The entire docValues field will be MMapped, so there's a time/space tradeoff. 2> docValues are stored in a sorted_set. This is relevant for multiValued field because: 2a> values are returned in sorted order, not the order they were in the document 2b> identical values are collapsed. So if the input values for a particular doc were 4, 3, 6, 4, 5, 2, 6, 5, 6, 5, 4, 3, 2 you'd get back 2, 3, 4, 5, 6 If you an live with those caveats, then returning field values would involve much less work (both I/O and CPU), especially in high-throughput situations. NOTE: there are a couple of JIRAs IIRC that have to do with not storing the <uniqueKey> though. Best, Erick On Wed, Feb 14, 2018 at 7:01 AM, Pratik Patel <pra...@semandex.net> wrote: > I had a similar issue with index size after upgrading to version 6.4.1 from > 5.x. The issue for me was that the field which caused index size to be > increased disproportionately had a field type("text_general") for which > default value of omitNorms was not true. Turning it on explicitly on field > fixed the problem. Following is the link to my related question. You can > verify value of omitNorms for your fields to check whether this is > applicable in your case or not. > http://search-lucene.com/m/Solr/eHNlagIB7209f1w1?subj=Fwd+Solr+dynamic+field+blowing+up+the+index+size > > On Tue, Feb 13, 2018 at 8:48 PM, Howe, David <david.h...@auspost.com.au> > wrote: > >> >> I have set docValues=false on all of the string fields in our index that >> have indexed=false and stored=true. This gave a small improvement in the >> index size from 13.3GB to 12.82GB. >> >> I have also tried running an optimize, which then reduced the index to >> 12.6GB. >> >> Next step is to dump the sizes of the Solr index files for the index >> version that is the correct size and the version that has the large size. >> >> Regards, >> >> David >> >> >> David Howe >> Java Domain Architect >> Postal Systems >> Level 16, 111 Bourke Street Melbourne VIC 3000 >> >> T 0391067904 >> >> M 0424036591 >> >> E david.h...@auspost.com.au >> >> W auspost.com.au >> W startrack.com.au >> >> -----Original Message----- >> From: Howe, David [mailto:david.h...@auspost.com.au] >> Sent: Wednesday, 14 February 2018 7:26 AM >> To: solr-user@lucene.apache.org >> Subject: RE: Index size increases disproportionately to size of added >> field when indexed=false >> >> >> Thanks Hoss. I will try setting docValues to false, as we only ever want >> to be able to retrieve the value of this field. >> >> Regards, >> >> David >> >> David Howe >> Java Domain Architect >> Postal Systems >> Level 16, 111 Bourke Street Melbourne VIC 3000 >> >> T 0391067904 >> >> M 0424036591 >> >> E david.h...@auspost.com.au >> >> W auspost.com.au >> W startrack.com.au >> >> Australia Post is committed to providing our customers with excellent >> service. If we can assist you in any way please telephone 13 13 18 or visit >> our website. >> >> The information contained in this email communication may be proprietary, >> confidential or legally professionally privileged. It is intended >> exclusively for the individual or entity to which it is addressed. You >> should only read, disclose, re-transmit, copy, distribute, act in reliance >> on or commercialise the information if you are authorised to do so. >> Australia Post does not represent, warrant or guarantee that the integrity >> of this email communication has been maintained nor that the communication >> is free of errors, virus or interference. >> >> If you are not the addressee or intended recipient please notify us by >> replying direct to the sender and then destroy any electronic or paper copy >> of this message. Any views expressed in this email communication are taken >> to be those of the individual sender, except where the sender specifically >> attributes those views to Australia Post and is authorised to do so. >> >> Please consider the environment before printing this email. >> Australia Post is committed to providing our customers with excellent >> service. If we can assist you in any way please telephone 13 13 18 or visit >> our website. >> >> The information contained in this email communication may be proprietary, >> confidential or legally professionally privileged. It is intended >> exclusively for the individual or entity to which it is addressed. You >> should only read, disclose, re-transmit, copy, distribute, act in reliance >> on or commercialise the information if you are authorised to do so. >> Australia Post does not represent, warrant or guarantee that the integrity >> of this email communication has been maintained nor that the communication >> is free of errors, virus or interference. >> >> If you are not the addressee or intended recipient please notify us by >> replying direct to the sender and then destroy any electronic or paper copy >> of this message. Any views expressed in this email communication are taken >> to be those of the individual sender, except where the sender specifically >> attributes those views to Australia Post and is authorised to do so. >> >> Please consider the environment before printing this email. >>