That's not the case (please read the entire email).

I'm starting with a fresh index each time when I run my tests.  In fact, I
even tested (multiple times) by deleting the entire "data" folder (stop /
start Solr).  In each case, I get the same exact results.

At one point, I started to wander if my index is not optimized, but looking
at the Solr admin page, there is a green check next to the "Optimized" text.

Steve


On Mon, Feb 15, 2016 at 3:29 PM, Upayavira <u...@odoko.co.uk> wrote:

> Not got time to read your mail in depth, but I bet it is because you are
> overwriting docs. When docs are overwritten, they are effectively marked
> as deleted then re-inserted, thus leaving you with both versions of your
> doc physically in your index. When you query though, the deleted one is
> filtered out.
>
> At some point later in time, when the number of commits you have made
> results on too many segments, a merge will be triggered, and this will
> remove the deleted documents from those merged segments.
>
> Compare the numDocs (number of undeleted docs) and the maxDocs (number
> of documents, whether deleted or not) for your index. I bet one will be
> 2x the other.
>
> Upayavira
>
> On Mon, Feb 15, 2016, at 08:12 PM, Steven White wrote:
> > Hi folks,
> >
> > I'm fixing code that I noticed to have a defect.  My expectation was that
> > once I make the fix, the index size will be smaller but instead I see it
> > growing.
> >
> > Here is the stripped down version of the code to show the issue:
> >
> > Buggy code #1:
> >
> >   for (String field : fieldsList)
> >   {
> >     doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm adding
> >     the
> > same value over and over
> >     doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
> >   }
> >
> >   docsToAdd.add(doc);
> >
> > Fixed code #2:
> >
> >   for (String field : fieldsList)
> >   {
> >     doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
> >   }
> >
> >   doc.addField(SolrField_ID_LIST, "1"); // <== Notice how I'm now adding
> > this value only once
> >
> >   docsToAdd.add(doc);
> >
> > I index the exact same data in both cases; all that changed is the logic
> > of
> > the code per the above.
> >
> > On my test index of 1000 records, when I look at Solr's admin page (same
> > is
> > true looking at the physical disk in the "index" folder) the index size
> > for
> > #1 is 834.77 KB, but for #2 it is 1.56 MB.
> >
> > As a side test, I changed the code to the following:
> >
> > Test code #3:
> >
> >   for (String field : fieldsList)
> >   {
> >     doc.addField(SolrField_ALL_FIELDS_DATA, stringData);
> >   }
> >
> >   // doc.addField(SolrField_ID_LIST, "1"); // <== I no longer include
> >   this
> > field
> >
> >   docsToAdd.add(doc);
> >
> > And now the index size is 2.27 MB !!!
> >
> > Yes, each time I run the test, i start with a fresh empty index (num
> > docs:
> > 0, index size: 0).
> >
> > Here are my field definitions:
> >
> >   <field name="ALL_FIELDS_DATA" type="text" multiValued="true"
> > indexed="true" required="false" stored="false"/>
> >   <field name="ID_LIST" type="string" multiValued="true" indexed="true"
> > required="false" stored="false"/>
> >
> > My question is, why my index size is going up in size?  I was expecting
> > it
> > to go down because I'm now indexing less data into each Solr document.
> >
> > Thanks
> >
> > Steve
>

Reply via email to