That's not the case (please read the entire email). I'm starting with a fresh index each time when I run my tests. In fact, I even tested (multiple times) by deleting the entire "data" folder (stop / start Solr). In each case, I get the same exact results.
At one point, I started to wander if my index is not optimized, but looking at the Solr admin page, there is a green check next to the "Optimized" text. Steve On Mon, Feb 15, 2016 at 3:29 PM, Upayavira <u...@odoko.co.uk> wrote: > Not got time to read your mail in depth, but I bet it is because you are > overwriting docs. When docs are overwritten, they are effectively marked > as deleted then re-inserted, thus leaving you with both versions of your > doc physically in your index. When you query though, the deleted one is > filtered out. > > At some point later in time, when the number of commits you have made > results on too many segments, a merge will be triggered, and this will > remove the deleted documents from those merged segments. > > Compare the numDocs (number of undeleted docs) and the maxDocs (number > of documents, whether deleted or not) for your index. I bet one will be > 2x the other. > > Upayavira > > On Mon, Feb 15, 2016, at 08:12 PM, Steven White wrote: > > Hi folks, > > > > I'm fixing code that I noticed to have a defect. My expectation was that > > once I make the fix, the index size will be smaller but instead I see it > > growing. > > > > Here is the stripped down version of the code to show the issue: > > > > Buggy code #1: > > > > for (String field : fieldsList) > > { > > doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm adding > > the > > same value over and over > > doc.addField(SolrField_ALL_FIELDS_DATA, stringData); > > } > > > > docsToAdd.add(doc); > > > > Fixed code #2: > > > > for (String field : fieldsList) > > { > > doc.addField(SolrField_ALL_FIELDS_DATA, stringData); > > } > > > > doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm now adding > > this value only once > > > > docsToAdd.add(doc); > > > > I index the exact same data in both cases; all that changed is the logic > > of > > the code per the above. > > > > On my test index of 1000 records, when I look at Solr's admin page (same > > is > > true looking at the physical disk in the "index" folder) the index size > > for > > #1 is 834.77 KB, but for #2 it is 1.56 MB. > > > > As a side test, I changed the code to the following: > > > > Test code #3: > > > > for (String field : fieldsList) > > { > > doc.addField(SolrField_ALL_FIELDS_DATA, stringData); > > } > > > > // doc.addField(SolrField_ID_LIST, "1"); // <== I no longer include > > this > > field > > > > docsToAdd.add(doc); > > > > And now the index size is 2.27 MB !!! > > > > Yes, each time I run the test, i start with a fresh empty index (num > > docs: > > 0, index size: 0). > > > > Here are my field definitions: > > > > <field name="ALL_FIELDS_DATA" type="text" multiValued="true" > > indexed="true" required="false" stored="false"/> > > <field name="ID_LIST" type="string" multiValued="true" indexed="true" > > required="false" stored="false"/> > > > > My question is, why my index size is going up in size? I was expecting > > it > > to go down because I'm now indexing less data into each Solr document. > > > > Thanks > > > > Steve >