Merhaba Furkan, We are planning to migrate to 3 nodes in an ensemble, but by now we have only one active zookeeper instance in production.
Actually, I thought about a param somewhere in Solr configuration. I may be wrong but I thought that the problem was due to the fact that Solr asks or tells zookeeper to update its states, but it cannot as it is busy garbaging its memory. Nevertheless, I will try modifying the tickTime param. For the second point, I will ask my boss if I can add our company to your wiki. Metin -----Message d'origine----- De : Furkan KAMACI [mailto:furkankam...@gmail.com] Envoyé : lundi 10 mars 2014 14:26 À : solr-user@lucene.apache.org Objet : Re: Zookeeper will not update cluster state when garbaging Hi Metin; I think that timeout value you are talking about is that: http://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html However it is not recommended to change timeout value of Zookeeper "if you do not have a specific reason". On the other hand how many Zookeepers do you have at your infrastructure? Also regardless of your question: if it is OK for you could you add your company here: https://wiki.apache.org/solr/PublicServers This may be nice for the people that who wonders about which companies uses Solr. Thanks; Furkan KAMACI 2014-03-10 12:35 GMT+02:00 OSMAN Metin <metin.os...@canal-plus.com>: > Hi all, > > we are using SolrCloud with this configuration : > > * SolR 4.4.0 > > * Zookeeper 3.4.5 > > * one server with zookeeper + 4 solr nodes > > * one server with 4 solr nodes > > * only one core > > * Solr instances deployed on tomcats with mod_cluster > > * clients access with SolRJ trough Apache + mod_cluster > > On the morning, we have massive updates (several thousands in a few > minute) with explicit softCommit=true. > This updates are load balanced on each regardless a node is the leader > or not. > > When this happens, the solr cloud admin console shows 7 nodes as > recovering and the leader as active. > We also noticed, that refreshing the graphic is very long. > This situation can last 3 hours until the clusterstate refreshes. > During this phase, Zookeeper is hardly garbaging (I can post the Munin > gc graphs). > > Here are the command line parameters of zookeeper and solr nodes (I > have replaced some values with XXX for confidentiality reason). > > Zookeeper : > > java -cp > /var/lib/zookeeper/bin/../build/classes:/var/lib/zookeeper/bin/../build/lib/*.jar:/var/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/var/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/var/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/var/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/var/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/var/lib/zookeeper/bin/../zookeeper-3.4.5.jar:/var/lib/zookeeper/bin/../src/java/lib/*.jar:/app/zookeeper/conf: > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.port=XXX -Xms384m -Xmx384m > -XX:MaxPermSize=128m -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.local.only=false > org.apache.zookeeper.server.quorum.QuorumPeerMain > /app/zookeeper/conf/zoo.cfg > > SolR : > > /usr/lib/jvm/java/bin/java > -Dsolr.data.dir=/app/solr/server/search_01/vod/data > -Dsolr.solr.home=/app/solr/server/search_01 -DnumShards=1 > -Dbootstrap_confdir=/app/solr/server/search_01/vod/conf > -Dcollection.configName=vod -DzkHost=XXX:2181 -Dtomcat.server.port=XXX > -Dtomcat.http.port=XXX -Dtomcat.ajp.port=XXX > -Dlog4j.configuration=file:///app/tomcat/server/search_01/conf/log4j.p > roperties > -Djboss.jvmRoute=SEARCH_02_01 > -Djboss.modcluster.sendToApacheDelayInSec=10 > -Djboss.modcluster.nodetimeout=30 -Djboss.modcluster.ttl=10 -Xms2048m > -Xmx2048m -XX:MaxPermSize=384m -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.port=XXX > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false -classpath > :/app/tomcat/server/search_01/bin/bootstrap.jar:/app/tomcat/server/sea > rch_01/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar > -Dcatalina.base=/app/tomcat/server/search_01 > -Dcatalina.home=/app/tomcat/server/search_01 -Djava.endorsed.dirs= > -Djava.io.tmpdir=/app/tomcat/server/search_01/temp > -Djava.util.logging.config.file=/app/tomcat/server/search_01/conf/log4 > j.properties > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > org.apache.catalina.startup.Bootstrap start > > I have tried other gc strategies, max heap values, new ratio, etc... > on Zookeeper without success. > Every time zookeeper is garbaging, the clusterstate is not correct. > > Is this a bug with zookeeper, SolR 4.4.0 or is it due to some > misconfiguration ? > I have seen somewhere that there is a timeout value between solr and > zookeeper, but I don't know where it is set (and what is its default value). > > Any help will be appreciated. > > Regards, > Metin >