Hi Modassar, Have you tried hitting the cores for each replica directly (instead of using the collection)? i.e. if you had col_shard1_replica1 on node1, then send the optimize command to that core URL directly:
curl -i -v "http://host:port/solr/col_shard1_replica1/update" -H 'Content-type:application/xml' \ --data-binary "<optimize/>" I haven't tried this myself but might work ;-) Tim On Wed, Jul 9, 2014 at 12:59 AM, Modassar Ather <modather1...@gmail.com> wrote: > Hi All, > > Thanks for your kind suggestions and inputs. > > We have been going the optimize way and it has helped. There have been > testing and benchmarking already done around memory and performance. > So while optimizing we see a scope of improvement on it by doing it > parallel so kindly suggest in what way it can be achieved. > > Thanks, > Modassar > > > On Wed, Jul 9, 2014 at 11:48 AM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > >> Hi Walter, >> >> I wonder why you think SolrCloud isn't necessary if you're indexing once >> per week. Isn't the automatic failover and auto-sharding still useful? One >> can also do custom sharding with SolrCloud if necessary. >> >> >> On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood <wun...@wunderwood.org> >> wrote: >> >> > More memory or faster disks will make a much bigger improvement than a >> > forced merge. >> > >> > What are you measuring? If it is average query time, that is not a good >> > measure. Look at 90th or 95th percentile. Test with queries from logs. >> > >> > No user can see a 10% or 20% difference. If your managers are watching >> > that, they are watching the wrong thing. >> > >> > If you are indexing once per week, you don't really need the complexity >> of >> > Solr Cloud. You can do manual sharding. >> > >> > wunder >> > >> > On Jul 8, 2014, at 10:55 PM, Modassar Ather <modather1...@gmail.com> >> > wrote: >> > >> > > Our index has almost 100M documents running on SolrCloud of 3 shards >> and >> > > each shard has an index size of about 700GB (for the record, we are not >> > > using stored fields - our documents are pretty large). We perform a >> full >> > > indexing every weekend and during the week there are no updates made to >> > the >> > > index. Most of the queries that we run are pretty complex with hundreds >> > of >> > > terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts >> etc. >> > > and take many minutes to execute. A difference of 10-20% is also a big >> > > advantage for us. >> > > >> > > We have been optimizing the index after indexing for years and it has >> > > worked well for us. Every once in a while, we upgrade Solr to the >> latest >> > > version and try without optimizing so that we can save the many hours >> it >> > > take to optimize such a huge index, but it does not work well. >> > > >> > > Kindly provide your suggestion. >> > > >> > > Thanks, >> > > Modassar >> > > >> > > >> > > On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood < >> wun...@wunderwood.org >> > > >> > > wrote: >> > > >> > >> I seriously doubt that you are required to force merge. >> > >> >> > >> How much improvement? And is the big performance cost also OK? >> > >> >> > >> I have worked on search engines that do automatic merges and offer >> > forced >> > >> merges for over fifteen years. For all that time, forced merges have >> > >> usually caused problems. >> > >> >> > >> Stop doing forced merges. >> > >> >> > >> wunder >> > >> >> > >> On Jul 8, 2014, at 10:09 PM, Modassar Ather <modather1...@gmail.com> >> > >> wrote: >> > >> >> > >>> Thanks Walter for your inputs. >> > >>> >> > >>> Our use case and performance benchmark requires us to invoke >> optimize. >> > >>> >> > >>> Here we see a chance of improvement in performance of optimize() if >> > >> invoked >> > >>> in parallel. >> > >>> I found that if* distrib=false *is used, the optimization will happen >> > in >> > >>> parallel. >> > >>> >> > >>> But I could not find a way to set it using >> > >> HttpSolrServer/CloudSolrServer. >> > >>> Also with the parameter setting as given in my mail above does not >> > seems >> > >> to >> > >>> work. >> > >>> >> > >>> Please let me know in what ways I can achieve the parallel optimize >> on >> > >>> SolrCloud. >> > >>> >> > >>> Thanks, >> > >>> Modassar >> > >>> >> > >>> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood < >> > wun...@wunderwood.org> >> > >>> wrote: >> > >>> >> > >>>> You probably do not need to force merge (mistakenly called >> "optimize") >> > >>>> your index. >> > >>>> >> > >>>> Solr does automatic merges, which work just fine. >> > >>>> >> > >>>> There are only a few situations where a forced merge is even a good >> > >> idea. >> > >>>> The most common one is a replicated (non-cloud) setup with a full >> > >> reindex >> > >>>> every night. >> > >>>> >> > >>>> If you need Solr Cloud, I cannot think of a situation where you >> would >> > >> want >> > >>>> a forced merge. >> > >>>> >> > >>>> wunder >> > >>>> >> > >>>> On Jul 8, 2014, at 2:01 AM, Modassar Ather <modather1...@gmail.com> >> > >> wrote: >> > >>>> >> > >>>>> Hi, >> > >>>>> >> > >>>>> Need to optimize index created using CloudSolrServer APIs under >> > >> SolrCloud >> > >>>>> setup of 3 instances on separate machines. Currently it optimizes >> > >>>>> sequentially if I invoke cloudSolrServer.optimize(). >> > >>>>> >> > >>>>> To make it parallel I tried making three separate HttpSolrServer >> > >>>> instances >> > >>>>> and invoked httpSolrServer.opimize() on them parallely but still it >> > >> seems >> > >>>>> to be doing optimization sequentially. >> > >>>>> >> > >>>>> I tried invoking optimize directly using HttpPost with following >> url >> > >> and >> > >>>>> parameters but still it seems to be sequential. >> > >>>>> *URL* : http://host:port/solr/collection/update >> > >>>>> >> > >>>>> *Parameters*: >> > >>>>> params.add(new BasicNameValuePair("optimize", "true")); >> > >>>>> params.add(new BasicNameValuePair("maxSegments", "1")); >> > >>>>> params.add(new BasicNameValuePair("waitFlush", "true")); >> > >>>>> params.add(new BasicNameValuePair("distrib", "false")); >> > >>>>> >> > >>>>> Kindly provide your suggestion and help. >> > >>>>> >> > >>>>> Regards, >> > >>>>> Modassar >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >> >> > >> -- >> > >> Walter Underwood >> > >> wun...@wunderwood.org >> > >> >> > >> >> > >> >> > >> >> > >> > -- >> > Walter Underwood >> > wun...@wunderwood.org >> > >> > >> > >> > >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >>