One more hour, and I have +0.5 mlns more (after commit/optimize) Something strange happening with SOLR buffer flush (if we have single segment???)... explicit commit prevents it...
30 hours, with index flush, commit: 783,714 + 1 hour, commit, optimize: 1,281,851 + 1 hour, commit, optimize: 1,786,552 Same random docs retrieved from web... Funtick wrote: > > > But how to explain that within an hour (after commit) I have had about > 500,000 new documents, and within 30 hours (after commit) only 783,714? > > Same _random_enough_ documents... > > BTW, SOLR Console was showing only few hundreds "deletesById" although I > don't use any deleteById explicitly; only "update" with "allowOverwrite" > and "uniqueId". > > > > > markrmiller wrote: >> >> I'd say you have a lot of documents that have the same id. >> When you add a doc with the same id, first the old one is deleted, then >> the >> new one is added (atomically though). >> >> The deleted docs are not removed from the index immediately though - the >> doc >> id is just marked as deleted. >> >> Over time though, as segments are merged due to hitting triggers while >> adding new documents, deletes are removed (which deletes depends on which >> segments have been merged). >> >> So if you add a tone of documents over time, many with the same ids, you >> would likely see this type of maxDoc, numDoc churn. maxDoc will include >> deleted docs while numDoc will not. >> >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> On Mon, Aug 17, 2009 at 11:09 PM, Funtick <f...@efendi.ca> wrote: >> >>> >>> After running an application which heavily uses MD5 HEX-representation >>> as >>> <uniqueKey> for SOLR v.1.4-dev-trunk: >>> >>> 1. After 30 hours: >>> 101,000,000 documents added >>> >>> 2. Commit: >>> numDocs = 783,714 >>> maxDoc = 3,975,393 >>> >>> 3. Upload new docs to SOLR during 1 hour(!!!!!!!), then commit, then >>> optimize: >>> numDocs=1,281,851 >>> maxDocs=1,281,851 >>> >>> It looks _extremely_ strange that within an hour I have such a huge >>> increase >>> with same 'average' document set... >>> >>> I am suspecting something goes wrong with Lucene buffer flush / index >>> merge >>> OR SOLR - Unique ID handling... >>> >>> According to my own estimates, I should have about 10,000,000 new >>> documents >>> now... I had 0.5 millions within an hour, and 0.8 mlns within a day; >>> same >>> 'random' documents. >>> >>> This morning index size was about 4Gb, then suddenly dropped below 0.5 >>> Gb. >>> Why? I haven't issued any "commit"... >>> >>> I am using ramBufferMB=8192 >>> >>> >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> > > -- View this message in context: http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017967.html Sent from the Solr - User mailing list archive at Nabble.com.