Any help on this is much appreciated. Is it better to use more cores for zookeeper (as opposed to 1 core machine)?
On Wed, Mar 12, 2014 at 4:28 PM, Chris W <chris1980....@gmail.com> wrote: > Hi Furkan > > Load on the network is very low when read workload is on the cluster. > During indexing, a few of my "commits" get hung forever and the solr nodes > are attempting to get connection from zookeeper. The peer communication > between zk is very good and i havent seen any issues. The network transfer > is around 15-20 mBps when i restart a solr node. > > *Infrastructure*: 10 node solrcloud cluster with 3 node zk ensemble > (m1.medium instances with 1 core cpu, 1.5Gb of Heap out of total of 3Gb > ram). Solr logs are in the same mount as the solr data and tlogs. Zk logs > are also in the same mount as zk data. I have 80+ collections which can > grow up to 150-200 easily. > > *Regarding ZK Data* > > Why does 50MB pose a problem if none of the system parameters are in an > alarming state? I have around 80+ collections in solr and the every > collection has the same schema but different solrconfig.xml. Hence I am > bundling every schema,config into a different zk folder and pushing that > as a separate config. Is there a way in solr/zookeeper to use one for > common files (like velocity template, schema) and push just the > solrconfig.xml into another config directory? In the 50MB I am sure that > atleast 90% of the data is duplicate across configs > > Kindly advise and thanks for your response > > > > > > > > > On Wed, Mar 12, 2014 at 4:08 PM, Furkan KAMACI <furkankam...@gmail.com>wrote: > >> Hi; >> >> FAQ page says that: >> >> *Q: I'm seeing lot's of session timeout exceptions - what to do?* >> *A: Try raising the ZooKeeper session timeout by editing solr.xml - see >> the >> zkClientTimeout attribute. The minimum session timeout is 2 times your >> ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The >> default tickTime is 2 seconds. You should avoiding raising this for no >> good >> reason, but it should be high enough that you don't see a lot of false >> session timeouts due to load, network lag, or garbage collection pauses. >> The default timeout is 15 seconds, but some environments might need to go >> as high as 30-60 seconds*. >> >> So when you do that what is the load of your network? Do you get that >> timeouts while heavy indexing or at an idle time? If not there should be a >> network problem. Could you chech whether a problem exists "between" your >> Zookeeper ensembles? On the other hand could you give some more >> information >> about your infrastructure and Solr logs? (PS: 50 mb data *may *cause a >> problem for your architecture) >> >> Thanks; >> Furkan KAMACI >> >> >> 2014-03-13 0:57 GMT+02:00 Chris W <chris1980....@gmail.com>: >> >> > Hi >> > >> > I have a 3 node zk ensemble . I see a very high latency for zk >> responses >> > and also a lot of outstanding requests (in the order of 30-40) >> > >> > I also see that the requests are not going to all zookeeper nodes >> equally. >> > One node has more requests/connections than the others. I see that >> CPU/Mem >> > and disk usage limits are very normal (under 30% cpu, disk reads in the >> > order of kb, jvm size is 2 Gb but it hasnt even reached 30% usage). The >> > size of data in zk is around 50MB >> > >> > I also see a few zk timeout for solrcloud nodes causing them to be >> shown as >> > "dead" in the cloud view. I have increased the connection timeout to >> around >> > 3 minutes and still the same issue seems to be happening >> > >> > How do i make zk respond faster to requests and where does zk usually >> spend >> > time while dealing with incoming requests? >> > >> > Any pointers on how to move forward will be great >> > >> > -- >> > Best >> > -- >> > C >> > >> > > > > -- > Best > -- > C > -- Best -- C