Author: rjung Date: Fri Mar 20 11:34:42 2009 New Revision: 756423 URL: http://svn.apache.org/viewvc?rev=756423&view=rev Log: Add some more documentation about local and global error states and error_escalation_time.
Modified: tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml tomcat/connectors/trunk/jk/xdocs/reference/workers.xml Modified: tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml URL: http://svn.apache.org/viewvc/tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml?rev=756423&r1=756422&r2=756423&view=diff ============================================================================== --- tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml (original) +++ tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml Fri Mar 20 11:34:42 2009 @@ -334,7 +334,72 @@ load balancer. </p> </subsection> +</section> +<section name="Load Balancer Error Detection"> +<subsection name="Local and Global Error States"> +<p> +A load balancer worker does not only have the ability to balance load. +It also handles stickyness and failover of requests in case of errors. +When a load balancer detects an error on one of its members, it needs to +decide, whether the error is serious, or only a temporary error or maybe +only related to the actual request that was processed. Temporary errors +are called local errors, serious errors will be called global errors. +</p> +<p> +If the load balancer decides that a backend should be put into the global error +state, then the web server will not send any more requests there. If no session +replication is used, this means that all user sessions located on the respective +backend are no longer available. The users will be send to another backend +and will have to login again. So the global error state is not transparent to the +users. The application is still available, but users might loose some work. +</p> +<p> +In some cases the decision between local error and global error is easy. +For instance if there is an error sending back the response to the client (browser), +then it is very unlikely that the backend is broken. +So this situation is a typical example of a local error. +</p> +<p> +Some situations are harder to decide though. If the load balancer can't establish +a new connection to a backend, it could be because of a temporary overload situation +(so no more free threads in the backend), or because the backend isn't alive any more. +Depending on the details, the right state could either be local error or global error. +</p> +</subsection> +<subsection name="Error Escalation Time"> +<p> +Until version 1.2.26 most errors were interpreted as global errors. +Starting with version 1.2.27 many errors which were previously interpreted as global +were switched to being local whenever the backend is still busy. Busy means, that +other concurrent requests are send to the same backend (successful or not). +</p> +<p> +In many cases there is no perfect way of making the decision +between local and global error. The load balancer simply doesn't have enough information. +In version 1.2.28 you can now tune, how fast the load balancer switches from local error to +global error. If a member of a load balancer stays in local error state for too long, +the load balancer will escalate it into global error state. +</p> +<p> +The time tolerated in local error state is controlled by the load balancer attribute +<b>error_escalation_time</b> (in seconds). The default value is half of <b>recover_time</b>, +so unless you changed <b>recover_time</b> the default is 30 seconds. +</p> +<p> +Using a smaller value for <b>error_escalation_time</b> will make the load balancer react +faster to serious errors, but also carries the risk of more often loosing sessions +in not so serious situations. You can lower <b>error_escalation_time</b> down to 0 seconds, +which means all local errors which are potentially serious are escalated to global errors +immediately. +</p> +<p> +Note that without good basic error detection the whole escalation procedure is useless. +So you should definitely use <b>socket_connect_timeout</b> and activate CPing/CPong +with <b>ping_mode</b> and <b>ping_timeout</b> before thinking about also tuning +<b>error_escalation_time</b>. +</p> +</subsection> </section> </body> Modified: tomcat/connectors/trunk/jk/xdocs/reference/workers.xml URL: http://svn.apache.org/viewvc/tomcat/connectors/trunk/jk/xdocs/reference/workers.xml?rev=756423&r1=756422&r2=756423&view=diff ============================================================================== --- tomcat/connectors/trunk/jk/xdocs/reference/workers.xml (original) +++ tomcat/connectors/trunk/jk/xdocs/reference/workers.xml Fri Mar 20 11:34:42 2009 @@ -932,11 +932,6 @@ will the node be put into error state. </p> <p> -Do not set <b>error_escalation_time</b> to a very short time unless you understand -the implications. Use <b>socket_connect_timeout</b> and activate CPing/CPong -with <b>ping_mode</b> and <b>ping_timeout</b> if you want fast error detection. -</p> -<p> This features has been added in <b>jk 1.2.28</b>. </p> </directive> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org