We do not set an conn_timeoout,read_timeout for the httpclient in snappuller.
I guess it should be set to some very high value say 1hr for read-timeout and say 1 minute for conn_timeout and we can make it configurable . --Noble On Tue, Mar 24, 2009 at 2:13 PM, Shalin Shekhar Mangar <shalinman...@gmail.com> wrote: > We should obviously get to the bottom of this. But I was thinking, should we > have some sort of timeouts on the SnapPuller in the slave to avoid such > scenarios? Locking out snap pulls forever is not a good idea. > > On Mon, Mar 23, 2009 at 8:57 PM, Yonik Seeley > <yo...@lucidimagination.com>wrote: > >> So this is only one slave that hangs up and not the master? >> Can you get thread dumps on both the master and the slave during a hang? >> >> >> -Yonik >> http://www.lucidimagination.com >> >> >> On Mon, Mar 23, 2009 at 10:44 AM, Jeff Newburn <jnewb...@zappos.com> >> wrote: >> > We are having an intermittent problem with replication. We reindex >> nightly >> > which usually means there are 2 commits during replication then a final >> > commit/optimize at the end. For some reason the replication will hang >> > occasionally with the following screenshot. This is frustrating as it >> will >> > completely stall out any further replications. Additionally, it seems to >> > only happen on reindex and it will strike 1 server randomly but not >> always >> > the same server. >> > >> > >> > In case the screen shot doesn’t come through: >> > >> > Master http://10.66.209.38:8080/solr/zeta-main/replication >> > Latest Index Version:1233423827699, Generation: 6237 >> > Replicatable Index Version:0, Generation: 0 >> > Poll Interval 00:05:00 >> > Local Index Index Version: 1233423827684, Generation: 6222 >> > Location: /opt/solr-data/zeta-main/index >> > Size: 1.29 GB >> > Times Replicated Since Startup: 3591 >> > Previous Replication Done At: Mon Mar 23 00:18:03 PDT 2009 >> > Config Files Replicated At: Wed Mar 18 06:07:53 PDT 2009 >> > Config Files Replicated: [synonyms.txt] >> > Times Config Files Replicated Since Startup: 4 >> > Next Replication Cycle At: Mon Mar 23 00:27:55 PDT 2009 >> > Current Replication Status Start Time: Mon Mar 23 00:22:55 PDT 2009 >> > Files Downloaded: 12 / 163 >> > Downloaded: 4.12 MB / 1.41 GB [0.0%] >> > Downloading File: _5no.tis, Downloaded: 0 bytes / 629.57 KB [0.0%] >> > Time Elapsed: 26371s, Estimated Time Remaining: 9216278s, Speed: 163 >> > bytes/s >> > >> > >> > >> > -- >> > Jeff Newburn >> > Software Engineer, Zappos.com >> > jnewb...@zappos.com - 702-943-7562 >> > >> > > > > -- > Regards, > Shalin Shekhar Mangar. > -- --Noble Paul