David: Right, Optimize Is Evil. Well, actually in your case it's not. In your specific case you can optimize every time you build your index and be OK, gory details here: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
But that's just for background. The key is how many deleted docs you have, which you can see from the admin UI screen. If you have 0 deleted docs, you have 0 space that would be reclaimed by an optimize. My bet is that you have no deleted docs, if so just forget the whole optimize question as it's a red herring. "...storage increase would be approximately 200,000 * 19 = 3.8M bytes = 3.6MB rather than the 7.5GB..." Actually I'd expect it to only be half that (1.9M). Stored fields are compressed on disk and we usually see about a 2:1 compression ratio. There'll be a little bit of fudge for metadata, but not enough to measure probably. So yes, this is totally weird. I think you'll also find that docValues is set to true by default. This _still_ shouldn't be adding that much to this index, but if you turn docValues off for that field what happens? Stored data is held in your *.fdt and *.fdx files. what's the total index space used in your index by these two extensions with and without your field? *.dvd files contain the docValues data, again what's the before/after size of all these files with and without your field? These are two specific places to look, but in general I'm asking what the total size is by extension in your index directory with and without your field on the guess that one extension will be massively bigger, this is totally surprising, but it'd give us a clue where to look. Here are the file extensions and what they contain BTW: https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/codecs/lucene70/package-summary.html Best, Erick On Tue, Feb 13, 2018 at 3:41 AM, Alessandro Benedetti <a.benede...@sease.io> wrote: > Hi David, > given the fact that you are actually building a new index from scratch, my > shot in the dark didn't hit any target. > When you say : "Once the import finishes we save the docker image in the > AWS docker repository. We then build our cluster using that image as the > base" > > Do you mean just configuraiton wise ? > Will the new cluster have any starting index on disk? > If i understood correctly your latest statements I expect a NO in here. > > So you are building a completely new index and comparing to the old index ( > which is completely separate) you denote such a big difference in size. > This is extremely suspicious . > Optimizing in the end is just a huge merge to force 1 ( or N) final > segments. > Given the additional information you gave me, it's not going to make much > difference. > > I would recommend to check how the index space is divided in different file > formats [1] > ( i.e. list how much space is dedicated to a specific extension) > > Stored content is in the .fdt files. > > > [1] > https://lucene.apache.org/core/6_4_0/core/org/apache/lucene/codecs/lucene62/package-summary.html#file-names > > > > ----- > --------------- > Alessandro Benedetti > Search Consultant, R&D Software Engineer, Director > Sease Ltd. - www.sease.io > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html