Hi, Please provide your inputs on optimize and commit running as background. Your suggestion will be really helpful.
Thanks, Modassar On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather <modather1...@gmail.com> wrote: > Erick! I could not find any underlying setting of 10 minutes. > It is not only optimize but commit is also behaving in the same fashion > and is taking lesser time than usually had taken. > As per my observation both are running in background. > > On Fri, May 29, 2015 at 7:21 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> I'm not talking about you setting a timeout, but the underlying >> connection timing out... >> >> The "10 minutes then the indexer exits" comment points in that direction. >> >> Best, >> Erick >> >> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather <modather1...@gmail.com> >> wrote: >> > I have not added any timeout in the indexer except zk client time out >> which >> > is 30 seconds. I am simply calling client.close() at the end of >> indexing. >> > The same code was not running in background for optimize with >> solr-4.10.3 >> > and org.apache.solr.client.solrj.impl.CloudSolrServer. >> > >> > On Fri, May 29, 2015 at 11:13 AM, Erick Erickson < >> erickerick...@gmail.com> >> > wrote: >> > >> >> Are you timing out on the client request? The theory here is that it's >> >> still a synchronous call, but you're just timing out at the client >> >> level. At that point, the optimize is still running it's just the >> >> connection has been dropped.... >> >> >> >> Shot in the dark. >> >> Erick >> >> >> >> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather < >> modather1...@gmail.com> >> >> wrote: >> >> > I could not notice it but with my past experience of commit which >> used to >> >> > take around 2 minutes is now taking around 8 seconds. I think this is >> >> also >> >> > running as background. >> >> > >> >> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather < >> modather1...@gmail.com >> >> > >> >> > wrote: >> >> > >> >> >> The indexer takes almost 2 hours to optimize. It has a >> multi-threaded >> >> add >> >> >> of batches of documents to >> >> >> org.apache.solr.client.solrj.impl.CloudSolrClient. >> >> >> Once all the documents are indexed it invokes commit and optimize. I >> >> have >> >> >> seen that the optimize goes into background after 10 minutes and >> indexer >> >> >> exits. >> >> >> I am not sure why this 10 minutes it hangs on indexer. This >> behavior I >> >> >> have seen in multiple iteration of the indexing of same data. >> >> >> >> >> >> There is nothing significant I found in log which I can share. I >> can see >> >> >> following in log. >> >> >> org.apache.solr.update.DirectUpdateHandler2; start >> >> >> >> >> >> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} >> >> >> >> >> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson < >> >> erickerick...@gmail.com> >> >> >> wrote: >> >> >> >> >> >>> All strange of course. What do your Solr logs show when this >> happens? >> >> >>> And how reproducible is this? >> >> >>> >> >> >>> Best, >> >> >>> Erick >> >> >>> >> >> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira <u...@odoko.co.uk> wrote: >> >> >>> > In this case, optimising makes sense, once the index is >> generated, >> >> you >> >> >>> > are not updating It. >> >> >>> > >> >> >>> > Upayavira >> >> >>> > >> >> >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote: >> >> >>> >> Our index has almost 100M documents running on SolrCloud of 5 >> shards >> >> >>> and >> >> >>> >> each shard has an index size of about 170+GB (for the record, >> we are >> >> >>> not >> >> >>> >> using stored fields - our documents are pretty large). We >> perform a >> >> >>> full >> >> >>> >> indexing every weekend and during the week there are no updates >> >> made to >> >> >>> >> the >> >> >>> >> index. Most of the queries that we run are pretty complex with >> >> hundreds >> >> >>> >> of >> >> >>> >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, >> boosts >> >> >>> etc. >> >> >>> >> and take many minutes to execute. A difference of 10-20% is >> also a >> >> big >> >> >>> >> advantage for us. >> >> >>> >> >> >> >>> >> We have been optimizing the index after indexing for years and >> it >> >> has >> >> >>> >> worked well for us. Every once in a while, we upgrade Solr to >> the >> >> >>> latest >> >> >>> >> version and try without optimizing so that we can save the many >> >> hours >> >> >>> it >> >> >>> >> take to optimize such a huge index, but find optimized index >> work >> >> well >> >> >>> >> for >> >> >>> >> us. >> >> >>> >> >> >> >>> >> Erick I was indexing today the documents and saw the optimize >> >> happening >> >> >>> >> in >> >> >>> >> background. >> >> >>> >> >> >> >>> >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson < >> >> >>> erickerick...@gmail.com> >> >> >>> >> wrote: >> >> >>> >> >> >> >>> >> > No results yet. I finished the test harness last night (not >> >> really a >> >> >>> >> > unit test, a stand-alone program that endlessly adds stuff and >> >> tests >> >> >>> >> > that every commit returns the correct number of docs). >> >> >>> >> > >> >> >>> >> > 8,000 cycles later there aren't any problems reported. >> >> >>> >> > >> >> >>> >> > Siiigggggh. >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather < >> >> >>> modather1...@gmail.com> >> >> >>> >> > wrote: >> >> >>> >> > > Hi, >> >> >>> >> > > >> >> >>> >> > > Erick you mentioned about a unit test to test the optimize >> >> running >> >> >>> in >> >> >>> >> > > background. Kindly share your findings if any. >> >> >>> >> > > >> >> >>> >> > > Thanks, >> >> >>> >> > > Modassar >> >> >>> >> > > >> >> >>> >> > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather < >> >> >>> modather1...@gmail.com >> >> >>> >> > > >> >> >>> >> > > wrote: >> >> >>> >> > > >> >> >>> >> > >> Thanks everybody for your replies. >> >> >>> >> > >> >> >> >>> >> > >> I have noticed the optimization running in background every >> >> time I >> >> >>> >> > >> indexed. This is 5 node cluster with solr-5.1.0 and uses >> the >> >> >>> >> > >> CloudSolrClient. Kindly share your findings on this issue. >> >> >>> >> > >> >> >> >>> >> > >> Our index has almost 100M documents running on SolrCloud. >> We >> >> have >> >> >>> been >> >> >>> >> > >> optimizing the index after indexing for years and it has >> worked >> >> >>> well for >> >> >>> >> > >> us. >> >> >>> >> > >> >> >> >>> >> > >> Thanks, >> >> >>> >> > >> Modassar >> >> >>> >> > >> >> >> >>> >> > >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson < >> >> >>> >> > erickerick...@gmail.com> >> >> >>> >> > >> wrote: >> >> >>> >> > >> >> >> >>> >> > >>> Actually, I've recently seen very similar behavior in Solr >> >> >>> 4.10.3, but >> >> >>> >> > >>> involving hard commits openSearcher=true, see: >> >> >>> >> > >>> https://issues.apache.org/jira/browse/SOLR-7572. Of >> course I >> >> >>> can't >> >> >>> >> > >>> reproduce this at will, siigggghhhh. >> >> >>> >> > >>> >> >> >>> >> > >>> A unit test should be very simple to write though, maybe >> I can >> >> >>> get to >> >> >>> >> > it >> >> >>> >> > >>> today. >> >> >>> >> > >>> >> >> >>> >> > >>> Erick >> >> >>> >> > >>> >> >> >>> >> > >>> >> >> >>> >> > >>> >> >> >>> >> > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira < >> u...@odoko.co.uk> >> >> >>> wrote: >> >> >>> >> > >>> > >> >> >>> >> > >>> > >> >> >>> >> > >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: >> >> >>> >> > >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote: >> >> >>> >> > >>> >> > I am using Solr-5.1.0. I have an indexer class which >> >> invokes >> >> >>> >> > >>> >> > cloudSolrClient.optimize(true, true, 1). My indexer >> exits >> >> >>> after >> >> >>> >> > the >> >> >>> >> > >>> >> > invocation of optimize and the optimization keeps on >> >> >>> running in >> >> >>> >> > the >> >> >>> >> > >>> >> > background. >> >> >>> >> > >>> >> > Kindly let me know if it is per design and how can I >> >> make my >> >> >>> >> > indexer >> >> >>> >> > >>> to >> >> >>> >> > >>> >> > wait until the optimization is over. Is there a >> >> >>> >> > >>> configuration/parameter I >> >> >>> >> > >>> >> > need to set for the same. >> >> >>> >> > >>> >> > >> >> >>> >> > >>> >> > Please note that the same indexer with >> >> >>> >> > >>> cloudSolrServer.optimize(true, true, >> >> >>> >> > >>> >> > 1) on Solr-4.10 used to wait till the optimize was >> over >> >> >>> before >> >> >>> >> > >>> exiting. >> >> >>> >> > >>> >> >> >> >>> >> > >>> >> This is very odd, because I could not get >> HttpSolrServer to >> >> >>> >> > optimize in >> >> >>> >> > >>> >> the background, even when that was what I wanted. >> >> >>> >> > >>> >> >> >> >>> >> > >>> >> I wondered if maybe the Cloud object behaves >> differently >> >> with >> >> >>> >> > regard to >> >> >>> >> > >>> >> blocking until an optimize is finished ... except that >> >> there >> >> >>> is no >> >> >>> >> > code >> >> >>> >> > >>> >> for optimizing in CloudSolrClient at all ... so I don't >> >> know >> >> >>> where >> >> >>> >> > the >> >> >>> >> > >>> >> different behavior would actually be happening. >> >> >>> >> > >>> > >> >> >>> >> > >>> > A more important question is, why are you optimising? >> >> >>> Generally it >> >> >>> >> > isn't >> >> >>> >> > >>> > recommended anymore as it reduces the natural >> distribution >> >> of >> >> >>> >> > documents >> >> >>> >> > >>> > amongst segments and makes future merges more costly. >> >> >>> >> > >>> > >> >> >>> >> > >>> > Upayavira >> >> >>> >> > >>> >> >> >>> >> > >> >> >> >>> >> > >> >> >> >>> >> > >> >> >>> >> >> >> >> >> >> >> >> >> > >