Re: Storing/indexing speed drops quickly

Per Steffensen Mon, 23 Sep 2013 01:16:45 -0700

Now running the tests on a slightly reduced setup (2 machines, quadcore,8GB ram ...), but that doesnt matter

We see that storing/indexing speed drops when usingIndexWriter.updateDocument in DirectUpdateHandler2.addDoc. But it doesnot drop when just using IndexWriter.addDocument (update-requests withoverwrite=false)Using addDocument:https://dl.dropboxusercontent.com/u/25718039/AddDocument_2Solr8GB_DocCount.pngUsing updateDocument:https://dl.dropboxusercontent.com/u/25718039/UpdateDocument_2Solr8GB_DocCount.pngWe are not too happy about having to use addDocument, because thatallows for duplicates, and we would really want to avoid that (onSolr/Lucene level)

We have confirmed that doubling amount of total RAM will double theamount of documents in the index where the indexing-speed startsdropping (when we use updateDocument)Onhttps://dl.dropboxusercontent.com/u/25718039/UpdateDocument_2Solr8GB_DocCount.pngyou can see that the speed drops at around 120M documents. Running thesame test, but with Solr machine having 16GB RAM (instead of 8GB) thespeed drops at around 240M documents.

Any comments on why indexing speed drops with IndexWriter.updateDocumentbut not with IndexWriter.addDocument?


Regards, Per Steffensen

On 9/12/13 10:14 AM, Per Steffensen wrote:

Seems like the attachments didnt make it through to this mailing list

https://dl.dropboxusercontent.com/u/25718039/doccount.png
https://dl.dropboxusercontent.com/u/25718039/iowait.png


On 9/12/13 8:25 AM, Per Steffensen wrote:
Hi
SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-nodeon each, one collection across the 6 nodes, 4 shards per nodeStoring/indexing from 100 threads on external machines, each threadone doc at the time, full speed (they always have a new doc tostore/index)
See attached images
* iowait.png: Measured I/O wait on the Solr machines
* doccount.png: Measured number of doc in Solr collection
Starting from an empty collection. Things are fine wrtstoring/indexing speed for the first two-three hours (100M docs perhour), then speed goes down dramatically, to an, for us, unacceptablelevel (max 10M per hour). At the same time as speed goes down, we seethat I/O wait increases dramatically. I am not 100% sure, but quickinvestigation has shown that this is due to almost constant merging.
What to do about this problem?
Know that you can play around with mergeFactor and commit-rate, butearlier tests shows that this really do not seem to do the job - itmight postpone the time where the problem occurs, but basically it isjust a matter of time before merging exhaust the system.Is there a way to totally avoid merging, and keep indexing speed at ahigh level, while still making sure that searches will perform fairlywell when data-amounts become big? (guess without merging you willend up with lots and lots of "small" files, and I guess this is notgood for search response-time)
Regards, Per Steffensen

Re: Storing/indexing speed drops quickly

Reply via email to