workers.xml

rjung Fri, 20 Mar 2009 04:35:09 -0700

Author: rjung
Date: Fri Mar 20 11:34:42 2009
New Revision: 756423

URL: http://svn.apache.org/viewvc?rev=756423&view=rev
Log:
Add some more documentation about local and global
error states and error_escalation_time.


Modified:
    tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml
    tomcat/connectors/trunk/jk/xdocs/reference/workers.xml

Modified: tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml
URL: 
http://svn.apache.org/viewvc/tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml?rev=756423&r1=756422&r2=756423&view=diff
==============================================================================
--- tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml (original)
+++ tomcat/connectors/trunk/jk/xdocs/generic_howto/timeouts.xml Fri Mar 20 
11:34:42 2009
@@ -334,7 +334,72 @@
 load balancer.
 </p>
 </subsection>
+</section>
 
+<section name="Load Balancer Error Detection">
+<subsection name="Local and Global Error States">
+<p>
+A load balancer worker does not only have the ability to balance load.
+It also handles stickyness and failover of requests in case of errors.
+When a load balancer detects an error on one of its members, it needs to
+decide, whether the error is serious, or only a temporary error or maybe
+only related to the actual request that was processed. Temporary errors
+are called local errors, serious errors will be called global errors.
+</p>
+<p>
+If the load balancer decides that a backend should be put into the global error
+state, then the web server will not send any more requests there. If no session
+replication is used, this means that all user sessions located on the 
respective
+backend are no longer available. The users will be send to another backend
+and will have to login again. So the global error state is not transparent to 
the
+users. The application is still available, but users might loose some work.
+</p>
+<p>
+In some cases the decision between local error and global error is easy.
+For instance if there is an error sending back the response to the client 
(browser),
+then it is very unlikely that the backend is broken.
+So this situation is a typical example of a local error.
+</p>
+<p>
+Some situations are harder to decide though. If the load balancer can't 
establish
+a new connection to a backend, it could be because of a temporary overload 
situation
+(so no more free threads in the backend), or because the backend isn't alive 
any more.
+Depending on the details, the right state could either be local error or 
global error.
+</p>
+</subsection>
+<subsection name="Error Escalation Time">
+<p>
+Until version 1.2.26 most errors were interpreted as global errors.
+Starting with version 1.2.27 many errors which were previously interpreted as 
global
+were switched to being local whenever the backend is still busy. Busy means, 
that
+other concurrent requests are send to the same backend (successful or not).
+</p>
+<p>
+In many cases there is no perfect way of making the decision
+between local and global error. The load balancer simply doesn't have enough 
information.
+In version 1.2.28 you can now tune, how fast the load balancer switches from 
local error to
+global error. If a member of a load balancer stays in local error state for 
too long,
+the load balancer will escalate it into global error state.
+</p>
+<p>
+The time tolerated in local error state is controlled by the load balancer 
attribute
+<b>error_escalation_time</b> (in seconds). The default value is half of 
<b>recover_time</b>,
+so unless you changed <b>recover_time</b> the default is 30 seconds.
+</p>
+<p>
+Using a smaller value for <b>error_escalation_time</b> will make the load 
balancer react
+faster to serious errors, but also carries the risk of more often loosing 
sessions
+in not so serious situations. You can lower <b>error_escalation_time</b> down 
to 0 seconds,
+which means all local errors which are potentially serious are escalated to 
global errors
+immediately.
+</p>
+<p>
+Note that without good basic error detection the whole escalation procedure is 
useless.
+So you should definitely use <b>socket_connect_timeout</b> and activate 
CPing/CPong
+with <b>ping_mode</b> and <b>ping_timeout</b> before thinking about also tuning
+<b>error_escalation_time</b>.
+</p>
+</subsection>
 </section>
 
 </body>

Modified: tomcat/connectors/trunk/jk/xdocs/reference/workers.xml
URL: 
http://svn.apache.org/viewvc/tomcat/connectors/trunk/jk/xdocs/reference/workers.xml?rev=756423&r1=756422&r2=756423&view=diff
==============================================================================
--- tomcat/connectors/trunk/jk/xdocs/reference/workers.xml (original)
+++ tomcat/connectors/trunk/jk/xdocs/reference/workers.xml Fri Mar 20 11:34:42 
2009
@@ -932,11 +932,6 @@
 will the node be put into error state.
 </p>
 <p>
-Do not set <b>error_escalation_time</b> to a very short time unless you 
understand
-the implications. Use <b>socket_connect_timeout</b> and activate CPing/CPong
-with <b>ping_mode</b> and <b>ping_timeout</b> if you want fast error detection.
-</p>
-<p>
 This features has been added in <b>jk 1.2.28</b>.
 </p>
 </directive>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

svn commit: r756423 - in /tomcat/connectors/trunk/jk/xdocs: generic_howto/timeouts.xml reference/workers.xml

Reply via email to