I'd say you have a lot of documents that have the same id.
When you add a doc with the same id, first the old one is deleted, then the
new one is added (atomically though).

The deleted docs are not removed from the index immediately though - the doc
id is just marked as deleted.

Over time though, as segments are merged due to hitting triggers while
adding new documents, deletes are removed (which deletes depends on which
segments have been merged).

So if you add a tone of documents over time, many with the same ids, you
would likely see this type of maxDoc, numDoc churn. maxDoc will include
deleted docs while numDoc will not.


-- 
- Mark

http://www.lucidimagination.com

On Mon, Aug 17, 2009 at 11:09 PM, Funtick <f...@efendi.ca> wrote:

>
> After running an application which heavily uses MD5 HEX-representation as
> <uniqueKey> for SOLR v.1.4-dev-trunk:
>
> 1. After 30 hours:
> 101,000,000 documents added
>
> 2. Commit:
> numDocs = 783,714
> maxDoc = 3,975,393
>
> 3. Upload new docs to SOLR during 1 hour(!!!!!!!), then commit, then
> optimize:
> numDocs=1,281,851
> maxDocs=1,281,851
>
> It looks _extremely_ strange that within an hour I have such a huge
> increase
> with same 'average' document set...
>
> I am suspecting something goes wrong with Lucene buffer flush / index merge
> OR SOLR - Unique ID handling...
>
> According to my own estimates, I should have about 10,000,000 new documents
> now... I had 0.5 millions within an hour, and 0.8 mlns within a day; same
> 'random' documents.
>
> This morning index size was about 4Gb, then suddenly dropped below 0.5 Gb.
> Why? I haven't issued any "commit"...
>
> I am using ramBufferMB=8192
>
>
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Reply via email to