Hi, There are 5 cores and a separate server for indexing on this solrcloud. Can you please share your suggestions on: How can indexer know that the optimize has completed even if the commit/optimize runs in background without going to the solr servers may be by using any solrj or other API?
I tried but could not find any API/handler to check if the optimizations is completed. Kindly share your inputs. Thanks, Modassar On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Can't get any failures to happen on my end so I really haven't a clue. > > Best, > Erick > > On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather <modather1...@gmail.com> > wrote: > > Hi, > > > > Please provide your inputs on optimize and commit running as background. > > Your suggestion will be really helpful. > > > > Thanks, > > Modassar > > > > On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather <modather1...@gmail.com> > > wrote: > > > >> Erick! I could not find any underlying setting of 10 minutes. > >> It is not only optimize but commit is also behaving in the same fashion > >> and is taking lesser time than usually had taken. > >> As per my observation both are running in background. > >> > >> On Fri, May 29, 2015 at 7:21 PM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >>> I'm not talking about you setting a timeout, but the underlying > >>> connection timing out... > >>> > >>> The "10 minutes then the indexer exits" comment points in that > direction. > >>> > >>> Best, > >>> Erick > >>> > >>> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather < > modather1...@gmail.com> > >>> wrote: > >>> > I have not added any timeout in the indexer except zk client time out > >>> which > >>> > is 30 seconds. I am simply calling client.close() at the end of > >>> indexing. > >>> > The same code was not running in background for optimize with > >>> solr-4.10.3 > >>> > and org.apache.solr.client.solrj.impl.CloudSolrServer. > >>> > > >>> > On Fri, May 29, 2015 at 11:13 AM, Erick Erickson < > >>> erickerick...@gmail.com> > >>> > wrote: > >>> > > >>> >> Are you timing out on the client request? The theory here is that > it's > >>> >> still a synchronous call, but you're just timing out at the client > >>> >> level. At that point, the optimize is still running it's just the > >>> >> connection has been dropped.... > >>> >> > >>> >> Shot in the dark. > >>> >> Erick > >>> >> > >>> >> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather < > >>> modather1...@gmail.com> > >>> >> wrote: > >>> >> > I could not notice it but with my past experience of commit which > >>> used to > >>> >> > take around 2 minutes is now taking around 8 seconds. I think > this is > >>> >> also > >>> >> > running as background. > >>> >> > > >>> >> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather < > >>> modather1...@gmail.com > >>> >> > > >>> >> > wrote: > >>> >> > > >>> >> >> The indexer takes almost 2 hours to optimize. It has a > >>> multi-threaded > >>> >> add > >>> >> >> of batches of documents to > >>> >> >> org.apache.solr.client.solrj.impl.CloudSolrClient. > >>> >> >> Once all the documents are indexed it invokes commit and > optimize. I > >>> >> have > >>> >> >> seen that the optimize goes into background after 10 minutes and > >>> indexer > >>> >> >> exits. > >>> >> >> I am not sure why this 10 minutes it hangs on indexer. This > >>> behavior I > >>> >> >> have seen in multiple iteration of the indexing of same data. > >>> >> >> > >>> >> >> There is nothing significant I found in log which I can share. I > >>> can see > >>> >> >> following in log. > >>> >> >> org.apache.solr.update.DirectUpdateHandler2; start > >>> >> >> > >>> >> > >>> > commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} > >>> >> >> > >>> >> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson < > >>> >> erickerick...@gmail.com> > >>> >> >> wrote: > >>> >> >> > >>> >> >>> All strange of course. What do your Solr logs show when this > >>> happens? > >>> >> >>> And how reproducible is this? > >>> >> >>> > >>> >> >>> Best, > >>> >> >>> Erick > >>> >> >>> > >>> >> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira <u...@odoko.co.uk> > wrote: > >>> >> >>> > In this case, optimising makes sense, once the index is > >>> generated, > >>> >> you > >>> >> >>> > are not updating It. > >>> >> >>> > > >>> >> >>> > Upayavira > >>> >> >>> > > >>> >> >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: > >>> >> >>> >> Our index has almost 100M documents running on SolrCloud of 5 > >>> shards > >>> >> >>> and > >>> >> >>> >> each shard has an index size of about 170+GB (for the record, > >>> we are > >>> >> >>> not > >>> >> >>> >> using stored fields - our documents are pretty large). We > >>> perform a > >>> >> >>> full > >>> >> >>> >> indexing every weekend and during the week there are no > updates > >>> >> made to > >>> >> >>> >> the > >>> >> >>> >> index. Most of the queries that we run are pretty complex > with > >>> >> hundreds > >>> >> >>> >> of > >>> >> >>> >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, > >>> boosts > >>> >> >>> etc. > >>> >> >>> >> and take many minutes to execute. A difference of 10-20% is > >>> also a > >>> >> big > >>> >> >>> >> advantage for us. > >>> >> >>> >> > >>> >> >>> >> We have been optimizing the index after indexing for years > and > >>> it > >>> >> has > >>> >> >>> >> worked well for us. Every once in a while, we upgrade Solr to > >>> the > >>> >> >>> latest > >>> >> >>> >> version and try without optimizing so that we can save the > many > >>> >> hours > >>> >> >>> it > >>> >> >>> >> take to optimize such a huge index, but find optimized index > >>> work > >>> >> well > >>> >> >>> >> for > >>> >> >>> >> us. > >>> >> >>> >> > >>> >> >>> >> Erick I was indexing today the documents and saw the optimize > >>> >> happening > >>> >> >>> >> in > >>> >> >>> >> background. > >>> >> >>> >> > >>> >> >>> >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson < > >>> >> >>> erickerick...@gmail.com> > >>> >> >>> >> wrote: > >>> >> >>> >> > >>> >> >>> >> > No results yet. I finished the test harness last night (not > >>> >> really a > >>> >> >>> >> > unit test, a stand-alone program that endlessly adds stuff > and > >>> >> tests > >>> >> >>> >> > that every commit returns the correct number of docs). > >>> >> >>> >> > > >>> >> >>> >> > 8,000 cycles later there aren't any problems reported. > >>> >> >>> >> > > >>> >> >>> >> > Siiigggggh. > >>> >> >>> >> > > >>> >> >>> >> > > >>> >> >>> >> > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather < > >>> >> >>> modather1...@gmail.com> > >>> >> >>> >> > wrote: > >>> >> >>> >> > > Hi, > >>> >> >>> >> > > > >>> >> >>> >> > > Erick you mentioned about a unit test to test the > optimize > >>> >> running > >>> >> >>> in > >>> >> >>> >> > > background. Kindly share your findings if any. > >>> >> >>> >> > > > >>> >> >>> >> > > Thanks, > >>> >> >>> >> > > Modassar > >>> >> >>> >> > > > >>> >> >>> >> > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather < > >>> >> >>> modather1...@gmail.com > >>> >> >>> >> > > > >>> >> >>> >> > > wrote: > >>> >> >>> >> > > > >>> >> >>> >> > >> Thanks everybody for your replies. > >>> >> >>> >> > >> > >>> >> >>> >> > >> I have noticed the optimization running in background > every > >>> >> time I > >>> >> >>> >> > >> indexed. This is 5 node cluster with solr-5.1.0 and uses > >>> the > >>> >> >>> >> > >> CloudSolrClient. Kindly share your findings on this > issue. > >>> >> >>> >> > >> > >>> >> >>> >> > >> Our index has almost 100M documents running on > SolrCloud. > >>> We > >>> >> have > >>> >> >>> been > >>> >> >>> >> > >> optimizing the index after indexing for years and it has > >>> worked > >>> >> >>> well for > >>> >> >>> >> > >> us. > >>> >> >>> >> > >> > >>> >> >>> >> > >> Thanks, > >>> >> >>> >> > >> Modassar > >>> >> >>> >> > >> > >>> >> >>> >> > >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson < > >>> >> >>> >> > erickerick...@gmail.com> > >>> >> >>> >> > >> wrote: > >>> >> >>> >> > >> > >>> >> >>> >> > >>> Actually, I've recently seen very similar behavior in > Solr > >>> >> >>> 4.10.3, but > >>> >> >>> >> > >>> involving hard commits openSearcher=true, see: > >>> >> >>> >> > >>> https://issues.apache.org/jira/browse/SOLR-7572. Of > >>> course I > >>> >> >>> can't > >>> >> >>> >> > >>> reproduce this at will, siigggghhhh. > >>> >> >>> >> > >>> > >>> >> >>> >> > >>> A unit test should be very simple to write though, > maybe > >>> I can > >>> >> >>> get to > >>> >> >>> >> > it > >>> >> >>> >> > >>> today. > >>> >> >>> >> > >>> > >>> >> >>> >> > >>> Erick > >>> >> >>> >> > >>> > >>> >> >>> >> > >>> > >>> >> >>> >> > >>> > >>> >> >>> >> > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira < > >>> u...@odoko.co.uk> > >>> >> >>> wrote: > >>> >> >>> >> > >>> > > >>> >> >>> >> > >>> > > >>> >> >>> >> > >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey > wrote: > >>> >> >>> >> > >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote: > >>> >> >>> >> > >>> >> > I am using Solr-5.1.0. I have an indexer class > which > >>> >> invokes > >>> >> >>> >> > >>> >> > cloudSolrClient.optimize(true, true, 1). My > indexer > >>> exits > >>> >> >>> after > >>> >> >>> >> > the > >>> >> >>> >> > >>> >> > invocation of optimize and the optimization keeps > on > >>> >> >>> running in > >>> >> >>> >> > the > >>> >> >>> >> > >>> >> > background. > >>> >> >>> >> > >>> >> > Kindly let me know if it is per design and how > can I > >>> >> make my > >>> >> >>> >> > indexer > >>> >> >>> >> > >>> to > >>> >> >>> >> > >>> >> > wait until the optimization is over. Is there a > >>> >> >>> >> > >>> configuration/parameter I > >>> >> >>> >> > >>> >> > need to set for the same. > >>> >> >>> >> > >>> >> > > >>> >> >>> >> > >>> >> > Please note that the same indexer with > >>> >> >>> >> > >>> cloudSolrServer.optimize(true, true, > >>> >> >>> >> > >>> >> > 1) on Solr-4.10 used to wait till the optimize was > >>> over > >>> >> >>> before > >>> >> >>> >> > >>> exiting. > >>> >> >>> >> > >>> >> > >>> >> >>> >> > >>> >> This is very odd, because I could not get > >>> HttpSolrServer to > >>> >> >>> >> > optimize in > >>> >> >>> >> > >>> >> the background, even when that was what I wanted. > >>> >> >>> >> > >>> >> > >>> >> >>> >> > >>> >> I wondered if maybe the Cloud object behaves > >>> differently > >>> >> with > >>> >> >>> >> > regard to > >>> >> >>> >> > >>> >> blocking until an optimize is finished ... except > that > >>> >> there > >>> >> >>> is no > >>> >> >>> >> > code > >>> >> >>> >> > >>> >> for optimizing in CloudSolrClient at all ... so I > don't > >>> >> know > >>> >> >>> where > >>> >> >>> >> > the > >>> >> >>> >> > >>> >> different behavior would actually be happening. > >>> >> >>> >> > >>> > > >>> >> >>> >> > >>> > A more important question is, why are you optimising? > >>> >> >>> Generally it > >>> >> >>> >> > isn't > >>> >> >>> >> > >>> > recommended anymore as it reduces the natural > >>> distribution > >>> >> of > >>> >> >>> >> > documents > >>> >> >>> >> > >>> > amongst segments and makes future merges more costly. > >>> >> >>> >> > >>> > > >>> >> >>> >> > >>> > Upayavira > >>> >> >>> >> > >>> > >>> >> >>> >> > >> > >>> >> >>> >> > >> > >>> >> >>> >> > > >>> >> >>> > >>> >> >> > >>> >> >> > >>> >> > >>> > >> > >> >