Can't get any failures to happen on my end so I really haven't a clue. Best, Erick
On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather <modather1...@gmail.com> wrote: > Hi, > > Please provide your inputs on optimize and commit running as background. > Your suggestion will be really helpful. > > Thanks, > Modassar > > On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather <modather1...@gmail.com> > wrote: > >> Erick! I could not find any underlying setting of 10 minutes. >> It is not only optimize but commit is also behaving in the same fashion >> and is taking lesser time than usually had taken. >> As per my observation both are running in background. >> >> On Fri, May 29, 2015 at 7:21 PM, Erick Erickson <erickerick...@gmail.com> >> wrote: >> >>> I'm not talking about you setting a timeout, but the underlying >>> connection timing out... >>> >>> The "10 minutes then the indexer exits" comment points in that direction. >>> >>> Best, >>> Erick >>> >>> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather <modather1...@gmail.com> >>> wrote: >>> > I have not added any timeout in the indexer except zk client time out >>> which >>> > is 30 seconds. I am simply calling client.close() at the end of >>> indexing. >>> > The same code was not running in background for optimize with >>> solr-4.10.3 >>> > and org.apache.solr.client.solrj.impl.CloudSolrServer. >>> > >>> > On Fri, May 29, 2015 at 11:13 AM, Erick Erickson < >>> erickerick...@gmail.com> >>> > wrote: >>> > >>> >> Are you timing out on the client request? The theory here is that it's >>> >> still a synchronous call, but you're just timing out at the client >>> >> level. At that point, the optimize is still running it's just the >>> >> connection has been dropped.... >>> >> >>> >> Shot in the dark. >>> >> Erick >>> >> >>> >> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather < >>> modather1...@gmail.com> >>> >> wrote: >>> >> > I could not notice it but with my past experience of commit which >>> used to >>> >> > take around 2 minutes is now taking around 8 seconds. I think this is >>> >> also >>> >> > running as background. >>> >> > >>> >> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather < >>> modather1...@gmail.com >>> >> > >>> >> > wrote: >>> >> > >>> >> >> The indexer takes almost 2 hours to optimize. It has a >>> multi-threaded >>> >> add >>> >> >> of batches of documents to >>> >> >> org.apache.solr.client.solrj.impl.CloudSolrClient. >>> >> >> Once all the documents are indexed it invokes commit and optimize. I >>> >> have >>> >> >> seen that the optimize goes into background after 10 minutes and >>> indexer >>> >> >> exits. >>> >> >> I am not sure why this 10 minutes it hangs on indexer. This >>> behavior I >>> >> >> have seen in multiple iteration of the indexing of same data. >>> >> >> >>> >> >> There is nothing significant I found in log which I can share. I >>> can see >>> >> >> following in log. >>> >> >> org.apache.solr.update.DirectUpdateHandler2; start >>> >> >> >>> >> >>> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} >>> >> >> >>> >> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson < >>> >> erickerick...@gmail.com> >>> >> >> wrote: >>> >> >> >>> >> >>> All strange of course. What do your Solr logs show when this >>> happens? >>> >> >>> And how reproducible is this? >>> >> >>> >>> >> >>> Best, >>> >> >>> Erick >>> >> >>> >>> >> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira <u...@odoko.co.uk> wrote: >>> >> >>> > In this case, optimising makes sense, once the index is >>> generated, >>> >> you >>> >> >>> > are not updating It. >>> >> >>> > >>> >> >>> > Upayavira >>> >> >>> > >>> >> >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: >>> >> >>> >> Our index has almost 100M documents running on SolrCloud of 5 >>> shards >>> >> >>> and >>> >> >>> >> each shard has an index size of about 170+GB (for the record, >>> we are >>> >> >>> not >>> >> >>> >> using stored fields - our documents are pretty large). We >>> perform a >>> >> >>> full >>> >> >>> >> indexing every weekend and during the week there are no updates >>> >> made to >>> >> >>> >> the >>> >> >>> >> index. Most of the queries that we run are pretty complex with >>> >> hundreds >>> >> >>> >> of >>> >> >>> >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, >>> boosts >>> >> >>> etc. >>> >> >>> >> and take many minutes to execute. A difference of 10-20% is >>> also a >>> >> big >>> >> >>> >> advantage for us. >>> >> >>> >> >>> >> >>> >> We have been optimizing the index after indexing for years and >>> it >>> >> has >>> >> >>> >> worked well for us. Every once in a while, we upgrade Solr to >>> the >>> >> >>> latest >>> >> >>> >> version and try without optimizing so that we can save the many >>> >> hours >>> >> >>> it >>> >> >>> >> take to optimize such a huge index, but find optimized index >>> work >>> >> well >>> >> >>> >> for >>> >> >>> >> us. >>> >> >>> >> >>> >> >>> >> Erick I was indexing today the documents and saw the optimize >>> >> happening >>> >> >>> >> in >>> >> >>> >> background. >>> >> >>> >> >>> >> >>> >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson < >>> >> >>> erickerick...@gmail.com> >>> >> >>> >> wrote: >>> >> >>> >> >>> >> >>> >> > No results yet. I finished the test harness last night (not >>> >> really a >>> >> >>> >> > unit test, a stand-alone program that endlessly adds stuff and >>> >> tests >>> >> >>> >> > that every commit returns the correct number of docs). >>> >> >>> >> > >>> >> >>> >> > 8,000 cycles later there aren't any problems reported. >>> >> >>> >> > >>> >> >>> >> > Siiigggggh. >>> >> >>> >> > >>> >> >>> >> > >>> >> >>> >> > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather < >>> >> >>> modather1...@gmail.com> >>> >> >>> >> > wrote: >>> >> >>> >> > > Hi, >>> >> >>> >> > > >>> >> >>> >> > > Erick you mentioned about a unit test to test the optimize >>> >> running >>> >> >>> in >>> >> >>> >> > > background. Kindly share your findings if any. >>> >> >>> >> > > >>> >> >>> >> > > Thanks, >>> >> >>> >> > > Modassar >>> >> >>> >> > > >>> >> >>> >> > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather < >>> >> >>> modather1...@gmail.com >>> >> >>> >> > > >>> >> >>> >> > > wrote: >>> >> >>> >> > > >>> >> >>> >> > >> Thanks everybody for your replies. >>> >> >>> >> > >> >>> >> >>> >> > >> I have noticed the optimization running in background every >>> >> time I >>> >> >>> >> > >> indexed. This is 5 node cluster with solr-5.1.0 and uses >>> the >>> >> >>> >> > >> CloudSolrClient. Kindly share your findings on this issue. >>> >> >>> >> > >> >>> >> >>> >> > >> Our index has almost 100M documents running on SolrCloud. >>> We >>> >> have >>> >> >>> been >>> >> >>> >> > >> optimizing the index after indexing for years and it has >>> worked >>> >> >>> well for >>> >> >>> >> > >> us. >>> >> >>> >> > >> >>> >> >>> >> > >> Thanks, >>> >> >>> >> > >> Modassar >>> >> >>> >> > >> >>> >> >>> >> > >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson < >>> >> >>> >> > erickerick...@gmail.com> >>> >> >>> >> > >> wrote: >>> >> >>> >> > >> >>> >> >>> >> > >>> Actually, I've recently seen very similar behavior in Solr >>> >> >>> 4.10.3, but >>> >> >>> >> > >>> involving hard commits openSearcher=true, see: >>> >> >>> >> > >>> https://issues.apache.org/jira/browse/SOLR-7572. Of >>> course I >>> >> >>> can't >>> >> >>> >> > >>> reproduce this at will, siigggghhhh. >>> >> >>> >> > >>> >>> >> >>> >> > >>> A unit test should be very simple to write though, maybe >>> I can >>> >> >>> get to >>> >> >>> >> > it >>> >> >>> >> > >>> today. >>> >> >>> >> > >>> >>> >> >>> >> > >>> Erick >>> >> >>> >> > >>> >>> >> >>> >> > >>> >>> >> >>> >> > >>> >>> >> >>> >> > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira < >>> u...@odoko.co.uk> >>> >> >>> wrote: >>> >> >>> >> > >>> > >>> >> >>> >> > >>> > >>> >> >>> >> > >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: >>> >> >>> >> > >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote: >>> >> >>> >> > >>> >> > I am using Solr-5.1.0. I have an indexer class which >>> >> invokes >>> >> >>> >> > >>> >> > cloudSolrClient.optimize(true, true, 1). My indexer >>> exits >>> >> >>> after >>> >> >>> >> > the >>> >> >>> >> > >>> >> > invocation of optimize and the optimization keeps on >>> >> >>> running in >>> >> >>> >> > the >>> >> >>> >> > >>> >> > background. >>> >> >>> >> > >>> >> > Kindly let me know if it is per design and how can I >>> >> make my >>> >> >>> >> > indexer >>> >> >>> >> > >>> to >>> >> >>> >> > >>> >> > wait until the optimization is over. Is there a >>> >> >>> >> > >>> configuration/parameter I >>> >> >>> >> > >>> >> > need to set for the same. >>> >> >>> >> > >>> >> > >>> >> >>> >> > >>> >> > Please note that the same indexer with >>> >> >>> >> > >>> cloudSolrServer.optimize(true, true, >>> >> >>> >> > >>> >> > 1) on Solr-4.10 used to wait till the optimize was >>> over >>> >> >>> before >>> >> >>> >> > >>> exiting. >>> >> >>> >> > >>> >> >>> >> >>> >> > >>> >> This is very odd, because I could not get >>> HttpSolrServer to >>> >> >>> >> > optimize in >>> >> >>> >> > >>> >> the background, even when that was what I wanted. >>> >> >>> >> > >>> >> >>> >> >>> >> > >>> >> I wondered if maybe the Cloud object behaves >>> differently >>> >> with >>> >> >>> >> > regard to >>> >> >>> >> > >>> >> blocking until an optimize is finished ... except that >>> >> there >>> >> >>> is no >>> >> >>> >> > code >>> >> >>> >> > >>> >> for optimizing in CloudSolrClient at all ... so I don't >>> >> know >>> >> >>> where >>> >> >>> >> > the >>> >> >>> >> > >>> >> different behavior would actually be happening. >>> >> >>> >> > >>> > >>> >> >>> >> > >>> > A more important question is, why are you optimising? >>> >> >>> Generally it >>> >> >>> >> > isn't >>> >> >>> >> > >>> > recommended anymore as it reduces the natural >>> distribution >>> >> of >>> >> >>> >> > documents >>> >> >>> >> > >>> > amongst segments and makes future merges more costly. >>> >> >>> >> > >>> > >>> >> >>> >> > >>> > Upayavira >>> >> >>> >> > >>> >>> >> >>> >> > >> >>> >> >>> >> > >> >>> >> >>> >> > >>> >> >>> >>> >> >> >>> >> >> >>> >> >>> >> >>