One more hour, and I have +0.5 mlns more (after commit/optimize)

Something strange happening with SOLR buffer flush (if we have single
segment???)... explicit commit prevents it...

30 hours, with index flush, commit: 783,714
+ 1 hour, commit, optimize: 1,281,851
+ 1 hour, commit, optimize: 1,786,552

Same random docs retrieved from web...



Funtick wrote:
> 
> 
> But how to explain that within an hour (after commit) I have had about
> 500,000 new documents, and within 30 hours (after commit) only 783,714?
> 
> Same _random_enough_ documents... 
> 
> BTW, SOLR Console was showing only few hundreds "deletesById" although I
> don't use any deleteById explicitly; only "update" with "allowOverwrite"
> and "uniqueId".
> 
> 
> 
> 
> markrmiller wrote:
>> 
>> I'd say you have a lot of documents that have the same id.
>> When you add a doc with the same id, first the old one is deleted, then
>> the
>> new one is added (atomically though).
>> 
>> The deleted docs are not removed from the index immediately though - the
>> doc
>> id is just marked as deleted.
>> 
>> Over time though, as segments are merged due to hitting triggers while
>> adding new documents, deletes are removed (which deletes depends on which
>> segments have been merged).
>> 
>> So if you add a tone of documents over time, many with the same ids, you
>> would likely see this type of maxDoc, numDoc churn. maxDoc will include
>> deleted docs while numDoc will not.
>> 
>> 
>> -- 
>> - Mark
>> 
>> http://www.lucidimagination.com
>> 
>> On Mon, Aug 17, 2009 at 11:09 PM, Funtick <f...@efendi.ca> wrote:
>> 
>>>
>>> After running an application which heavily uses MD5 HEX-representation
>>> as
>>> <uniqueKey> for SOLR v.1.4-dev-trunk:
>>>
>>> 1. After 30 hours:
>>> 101,000,000 documents added
>>>
>>> 2. Commit:
>>> numDocs = 783,714
>>> maxDoc = 3,975,393
>>>
>>> 3. Upload new docs to SOLR during 1 hour(!!!!!!!), then commit, then
>>> optimize:
>>> numDocs=1,281,851
>>> maxDocs=1,281,851
>>>
>>> It looks _extremely_ strange that within an hour I have such a huge
>>> increase
>>> with same 'average' document set...
>>>
>>> I am suspecting something goes wrong with Lucene buffer flush / index
>>> merge
>>> OR SOLR - Unique ID handling...
>>>
>>> According to my own estimates, I should have about 10,000,000 new
>>> documents
>>> now... I had 0.5 millions within an hour, and 0.8 mlns within a day;
>>> same
>>> 'random' documents.
>>>
>>> This morning index size was about 4Gb, then suddenly dropped below 0.5
>>> Gb.
>>> Why? I haven't issued any "commit"...
>>>
>>> I am using ramBufferMB=8192
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017967.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to