I'd say you have a lot of documents that have the same id. When you add a doc with the same id, first the old one is deleted, then the new one is added (atomically though).
The deleted docs are not removed from the index immediately though - the doc id is just marked as deleted. Over time though, as segments are merged due to hitting triggers while adding new documents, deletes are removed (which deletes depends on which segments have been merged). So if you add a tone of documents over time, many with the same ids, you would likely see this type of maxDoc, numDoc churn. maxDoc will include deleted docs while numDoc will not. -- - Mark http://www.lucidimagination.com On Mon, Aug 17, 2009 at 11:09 PM, Funtick <f...@efendi.ca> wrote: > > After running an application which heavily uses MD5 HEX-representation as > <uniqueKey> for SOLR v.1.4-dev-trunk: > > 1. After 30 hours: > 101,000,000 documents added > > 2. Commit: > numDocs = 783,714 > maxDoc = 3,975,393 > > 3. Upload new docs to SOLR during 1 hour(!!!!!!!), then commit, then > optimize: > numDocs=1,281,851 > maxDocs=1,281,851 > > It looks _extremely_ strange that within an hour I have such a huge > increase > with same 'average' document set... > > I am suspecting something goes wrong with Lucene buffer flush / index merge > OR SOLR - Unique ID handling... > > According to my own estimates, I should have about 10,000,000 new documents > now... I had 0.5 millions within an hour, and 0.8 mlns within a day; same > 'random' documents. > > This morning index size was about 4Gb, then suddenly dropped below 0.5 Gb. > Why? I haven't issued any "commit"... > > I am using ramBufferMB=8192 > > > > > > > -- > View this message in context: > http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html > Sent from the Solr - User mailing list archive at Nabble.com. > >