Thanks so much for your help and for the explanations. Eventually, we will be doing several batches in parallel. But at least now I know where to look and can do some testing on various scenarios.
Since we may be doing a lot of heavy uploading (while still doing a lot of queries), having a autoCommit interval shorter than the softAutoCommit internal does sound interesting and I will test it out. And then just disable softCommit on my batch uploads. Either way, I at least know where to focus my efforts. -Kevin On Wed, Aug 14, 2013 at 6:27 AM, Jason Hellman < jhell...@innoventsolutions.com> wrote: > Kevin, > > I wouldn't have considered using softCommits at all based on what I > understand from your use case. You appear to be loading in large batches, > and softCommits are better aligned to NRT search where there is a steady > stream of smaller updates that need to be available immediately. > > As Erick pointed out, soft commits are all about avoiding constant > reopening of the index searcher…where by constant we mean every few > seconds. Provided you can wait until your batch is completed, and that > frequency is roughly a minute or more, you likely will find an > old-fashioned hard commit (with openSearcher="true") will work just fine > (YMMV). > > Jason > > > > On Aug 14, 2013, at 4:51 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > right, SOLR-5081 is possible but somewhat unlikely > > given the fact that you actually don't have very many > > nodes in your cluster. > > > > soft commits aren't relevant to the tlog, but here's > > the thing. Your tlogs may get replayed > > when you restart solr. If they're large, this may take > > a long time. When you said you restarted Solr after > > killing it, you might have triggered this. > > > > The way to keep tlogs small is to hard commit more > > frequently (you should look at their size before > > worrying about it though!). If you set openSearcher=false, > > this is pretty inexpensive, all it really does is close > > the current segment files, open new ones, and start a new > > tlog file. It does _not_ invalidate caches, do autowarming, > > all that expensive stuff. > > > > Your soft commit does _not_ improve performance! It is > > just "less expensive" than a hard commit with > > openSearcher=true. It _does_ invalidate caches, fire > > off autowarming, etc. So it does "improve performance" > > over doing hard commits with openSearcher=true > > with the same frequency, but it still isn't free. It's still > > good to have the soft commit interval as long as you > > can tolerate. > > > > It's perfectly reasonable to have a hard commit interval > > that's much shorter than your soft commit interval. As > > Yonik explained once, "soft commits are about visibility > > but hard commits are about durability". > > > > Best > > Erick > > > > > > On Wed, Aug 14, 2013 at 2:20 AM, Kevin Osborn <kevin.osb...@cbsi.com> > wrote: > > > >> Interesting, that did work. Do you or anyone else have any ideas or > what I > >> should look at? While soft commit is not a requirement in my project, my > >> understanding is that it should help performance. On the same index, I > will > >> be doing both a large number of queries as well as updates. > >> > >> If I have to disable autoCommit, should I increase the chunk size? > >> > >> Of course, I will have to run a more large scale test tomorrow, but I > saw > >> this problem fairly consistently in my smaller test. > >> > >> In a previous experiment, I applied the SOLR-4816 patch that someone > >> indicated might help. I also reduced the CSV upload chunk size to 500. > It > >> seemed like things got a little better, but still eventually hung. > >> > >> I also see SOLR-5081, but I don't know if that is my issue or not. At > least > >> in my test, the index writes are not parallel as in the ticket. > >> > >> -Kevin > >> > >> > >> On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman < > >> jhell...@innoventsolutions.com> wrote: > >> > >>> While I don't have a past history of this issue to use as reference, > if I > >>> were in your shoes I would consider trying your updates with softCommit > >>> disabled. My suspicion is you're experiencing some issue with the > >>> transaction logging and how it's managed when your hard commit occurs. > >>> > >>> If you can give that a try and let us know how that fares we might have > >>> some further input to share. > >>> > >>> > >>> On Aug 13, 2013, at 11:54 AM, Kevin Osborn <kevin.osb...@cbsi.com> > >> wrote: > >>> > >>>> I am using Solr Cloud 4.4. It is pretty much a base configuration. We > >>> have > >>>> 2 servers and 3 collections. Collection1 is 1 shard and the > Collection2 > >>> and > >>>> Collection3 both have 2 shards. Both servers are identical. > >>>> > >>>> So, here is my process, I do a lot of queries on Collection1 and > >>>> Collection2. I then do a bunch of inserts into Collection3. I am doing > >>> CSV > >>>> uploads. I am also doing custom shard routing. All the products in a > >>> single > >>>> upload will have the same shard key. All Solr interaction is through > >>> SolrJ > >>>> with full Zookeeper awareness. My uploads are also using soft commits. > >>>> > >>>> I tried this on a record set of 936 products. Everything worked fine. > I > >>>> then sent over a record set of 300k products. The upload into > >> Collection3 > >>>> is chunked. I tried both 1000 and 200,000 with similar results. The > >> first > >>>> upload to Solr would just hang. There would simply be no response from > >>>> Solr. A few of the products from this request would make it into the > >>> index, > >>>> but not many. > >>>> > >>>> In this state, queries continued to work, but deletes did not. > >>>> > >>>> My only solution was to kill each Solr process. > >>>> > >>>> As an experiment, I did the large catalog first. First, I reset > >>> everything. > >>>> With A chunk size of 1000, about 110,000 out of 300,000 records made > it > >>>> into Solr before the process hung. Again, queries worked, but deletes > >> did > >>>> not and I had to kill Solr. It hung after about 30 seconds. > >> Timing-wise, > >>>> this is at about the second autocommit cycle, given the default > >>> autocommit > >>>> of 15 seconds. I am not sure if this is related or not. > >>>> > >>>> As an additional experiment, I ran the entire test with just a single > >>> node > >>>> in the cluster. This time, everything ran fine. > >>>> > >>>> Does anyone have any ideas? Everything is pretty default. These > servers > >>> are > >>>> Azure VMs, although I have seen similar behavior running two Solr > >>> instances > >>>> on a single internal server as well. > >>>> > >>>> I had also noticed similar behavior before with Solr 4.3. It > definitely > >>> has > >>>> something do with the clustering, but I am not sure what. And I don't > >> see > >>>> any error message (or really anything else) in the Solr logs. > >>>> > >>>> Thanks. > >>>> > >>>> -- > >>>> *KEVIN OSBORN* > >>>> LEAD SOFTWARE ENGINEER > >>>> CNET Content Solutions > >>>> OFFICE 949.399.8714 > >>>> CELL 949.310.4677 SKYPE osbornk > >>>> 5 Park Plaza, Suite 600, Irvine, CA 92614 > >>>> [image: CNET Content Solutions] > >>> > >>> > >> > >> > >> -- > >> *KEVIN OSBORN* > >> LEAD SOFTWARE ENGINEER > >> CNET Content Solutions > >> OFFICE 949.399.8714 > >> CELL 949.310.4677 SKYPE osbornk > >> 5 Park Plaza, Suite 600, Irvine, CA 92614 > >> [image: CNET Content Solutions] > >> > > -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]