Re: Optimize SolrCloud without downtime

2015-03-31 Thread Erick Erickson
I really don't have a good explanation here, those are the default values and the folks who set them up no doubt chose them with some care. Afraid I'll have to defer to people who actually know the code... Erick On Mon, Mar 30, 2015 at 11:59 PM, Pavel Hladik wrote: > When we indexing I see the d

Re: Optimize SolrCloud without downtime

2015-03-31 Thread Pavel Hladik
When we indexing I see the deleted docs are a bit changing.. I was surprised when developer reindex 120M index, we had around 110M of deleted docs and this number was not falling. As you wrote, the typical behavior should be merging deleted docs to 10-20% of whole index? So it should be after two w

Re: Optimize SolrCloud without downtime

2015-03-30 Thread Erick Erickson
Hmmm, are you indexing during the time you see the deleted docs not changing? Because this is very strange. Theoretically, if you reindex everything, that should result in segments that have _no_ live docs in them and they should really disappear ASAP. One way to work around this if we determine t

Re: Optimize SolrCloud without downtime

2015-03-30 Thread Pavel Hladik
Hi, thanks for reply. We have a lot of deleted docs cause we have to reindex all records from time to time, changing some important parameters.. When we do update, it means create and delete. Our deleted docs do not disappear by merging segments. I see our deleted docs are almost the same number

Re: Optimize SolrCloud without downtime

2015-03-25 Thread Erick Erickson
bq: It does NOT optimize multiple replicas or shards in parallel. This behavior was changed in 4.10 though, see: https://issues.apache.org/jira/browse/SOLR-6264 So with 5.0 Pavel is seeing the result of that JIRA I bet. I have to agree with Shawn, the optimization step should proceed invisibly

Re: Optimize SolrCloud without downtime

2015-03-25 Thread Shawn Heisey
On 3/25/2015 9:08 AM, pavelhladik wrote: > Our data are changing frequently so that's why so many deletedDocs. > Optimized core takes around 50GB on disk, we are now almost on 100GB and I'm > looking for best solution howto optimize this huge core without downtime. I > know optimization working in

Re: Optimize SolrCloud without downtime

2015-03-25 Thread Erick Erickson
That's a high number of deleted documents as a percentage of your index! Or at least I find those numbers surprising. When segments are merged in the background during normal indexing, quite a bit of weight is given to segments that have a high percentage of deleted docs. I usually see at most 10-2