Still no luck starting solr with 40s zkClientTimeout. I'm not seeing any
expired sessions...

There must be a way to start solr with many collections. It runs fine..
until a restart is required.

On 3 March 2015 at 03:33, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/2/2015 12:54 AM, Damien Kamerman wrote:
> > I still see the same cloud startup issue with Solr 5.0.0. I created 4,000
> > collections from scratch and then attempted to stop/start the cloud.
> >
> > node1:
> > WARN  - 2015-03-02 18:09:02.371;
> > org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog
> > WARN  - 2015-03-02 18:10:07.196; org.apache.solr.cloud.ZkController;
> Timed
> > out waiting to see all nodes published as DOWN in our cluster state.
> > WARN  - 2015-03-02 18:13:46.238; org.apache.solr.cloud.ZkController;
> Still
> > seeing conflicting information about the leader of shard shard1 for
> > collection DDDDDD-3219 after 30 seconds; our state says
> > http://host:8002/solr/DDDDDD-3219_shard1_replica1/, but ZooKeeper says
> > http://host:8000/solr/DDDDDD-3219_shard1_replica2/
> >
> > node2:
> > WARN  - 2015-03-02 18:09:01.871;
> > org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog
> > WARN  - 2015-03-02 18:17:04.458;
> > org.apache.solr.common.cloud.ZkStateReader$3; ZooKeeper watch triggered,
> > but Solr cannot talk to ZK
> > stop/start
> > WARN  - 2015-03-02 18:53:12.725;
> > org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog
> > WARN  - 2015-03-02 18:56:30.702; org.apache.solr.cloud.ZkController;
> Still
> > seeing conflicting information about the leader of shard shard1 for
> > collection DDDDDD-3581 after 30 seconds; our state says
> > http://host:8001/solr/DDDDDD-3581_shard1_replica2/, but ZooKeeper says
> > http://host:8002/solr/DDDDDD-3581_shard1_replica1/
> >
> > node3:
> > WARN  - 2015-03-02 18:09:03.022;
> > org.eclipse.jetty.server.handler.RequestLogHandler; !RequestLog
> > WARN  - 2015-03-02 18:10:08.178; org.apache.solr.cloud.ZkController;
> Timed
> > out waiting to see all nodes published as DOWN in our cluster state.
> > WARN  - 2015-03-02 18:13:47.737; org.apache.solr.cloud.ZkController;
> Still
> > seeing conflicting information about the leader of shard shard1 for
> > collection DDDDDD-2707 after 30 seconds; our state says
> > http://host:8002/solr/DDDDDD-2707_shard1_replica2/, but ZooKeeper says
> > http://host:8000/solr/DDDDDD-2707_shard1_replica1/
>
> I'm sorry to hear that 5.0 didn't fix the problem.  I really hoped that
> it would.
>
> There is one other thing I'd like to try before you file a bug --
> increasing zkClientTimeout to 40 seconds, to see whether it allows
> changes the point at which it fails (or allows it to succeed).  With the
> default tickTime (2 seconds), the maximum time you can set
> zkClientTimeout to is 40 seconds ... which in normal circumstances is a
> VERY long time.  In your situation, at least with the code in its
> current state, 30 seconds (I'm pretty sure this is the default in 5.0)
> may simply not be enough.
>
>
> https://cwiki.apache.org/confluence/display/solr/Parameter+Reference#ParameterReference-SolrCloudInstanceZooKeeperParameters
>
> I think filing a bug, even if 40 seconds allows this to succeed, is a
> good idea ... but you might want to wait for some of the cloud experts
> to look at your logs to see if they have anything to add.
>
> Thanks,
> Shawn
>
>


-- 
Damien Kamerman

Reply via email to