On 6/19/2013 10:38 AM, Shawn Heisey wrote:
Looking at the numDocs for each segment, here's what I think is happening:
The autoCommit kicks in after the first 25000 docs (25002 to be
precise), but the ram buffer isn't emptied. The next 3339 documents get
indexed, at which point the ram buffer fills up, so it flushes another
segment. Then it does another 21674 docs to approximately reach 25000
for autoCommit, which forces another segment flush, but without emptying
the buffer. lather, rinse, repeat.
I seem to be wrong about it being strictly related to ramBufferSizeMB.
Today I bumped the buffer up to 256MB, restarted Solr, and started
another full-import.
If I were completely right about the buffer interaction, this should
have resulted in a few somewhat equal sized segments being created
before creating a small one. It didn't change anything - it's still two
segments per autocommit, one of which is around 3000 docs and the other
adds to that to make about 25000.
There's still something weird going on, but now I know that I don't
completely understand it. I hope someone can shed some light.
Thanks,
Shawn