Author: rjung
Date: Tue Mar 10 02:14:44 2009
New Revision: 751962

URL: http://svn.apache.org/viewvc?rev=751962&view=rev
Log:
We shouldn't overwrite global error state in cases
where we don't know the real state. If it was real
error before, then keep it in error.

So in all cases except previous global error, we
switch the member into OK.

Then in cases were we know it is fine, we also
switch to OK.

In cases were we know the member is bad we switch to
ERROR.

In all cases were we can't be sure, keep the previous
state (OK or ERROR).

Other states, like BUSY, RECOVER, IDLE or FORCE will
be overwritten with OK after the service, and then
possibly again overwritten with ERROR by the above rules.

Only reset error_time when we switch from non-OK to
OK. Only overwrite error_time with a new error_time,
if we detect a real global problem (and thus switch
the global state to error too).

Modified:
    tomcat/connectors/trunk/jk/native/common/jk_lb_worker.c

Modified: tomcat/connectors/trunk/jk/native/common/jk_lb_worker.c
URL: 
http://svn.apache.org/viewvc/tomcat/connectors/trunk/jk/native/common/jk_lb_worker.c?rev=751962&r1=751961&r2=751962&view=diff
==============================================================================
--- tomcat/connectors/trunk/jk/native/common/jk_lb_worker.c (original)
+++ tomcat/connectors/trunk/jk/native/common/jk_lb_worker.c Tue Mar 10 02:14:44 
2009
@@ -1240,10 +1240,12 @@
                     }
                 }
 
-                /* When returning the endpoint mark the worker as not busy.
-                 * We have at least one endpoint free
+                /* When have an endpoint and we ran a request, assume
+                 * we are OK, unless we last were in error.
+                 * We will below explicitely set OK or ERROR according
+                 * to the returned service_stat.
                  */
-                if (rec->s->state == JK_LB_STATE_BUSY) {
+                if (rec->s->state != JK_LB_STATE_ERROR) {
                     rec->s->state  = JK_LB_STATE_OK;
                     p->states[rec->i] = JK_LB_STATE_OK;
                 }
@@ -1279,29 +1281,28 @@
                 else if (service_stat == JK_SERVER_ERROR) {
                     /*
                      * Internal JK server error.
-                     * Don't mark the node as bad.
+                     * Keep previous global state.
+                     * Do not try to reuse the same node for the same request.
                      * Failing over to another node could help.
                      */
-                    rec->s->state  = JK_LB_STATE_OK;
                     p->states[rec->i] = JK_LB_STATE_ERROR;
-                    rec->s->error_time = 0;
                     rc = JK_FALSE;
                 }
                 else if (service_stat == JK_AJP_PROTOCOL_ERROR) {
                     /*
-                     * We've received the bad AJP message from the backend.
-                     * Don't mark the node as bad.
+                     * We've received a bad AJP message from the backend.
+                     * Keep previous global state.
+                     * Do not try to reuse the same node for the same request.
                      * Failing over to another node could help.
                      */
-                    rec->s->state  = JK_LB_STATE_OK;
                     p->states[rec->i] = JK_LB_STATE_ERROR;
-                    rec->s->error_time = 0;
                     rc = JK_FALSE;
                 }
                 else if (service_stat == JK_STATUS_ERROR) {
                     /*
                      * Status code configured as service is down.
-                     * Don't mark the node as bad.
+                     * The node is fine.
+                     * Do not try to reuse the same node for the same request.
                      * Failing over to another node could help.
                      */
                     rec->s->state  = JK_LB_STATE_OK;
@@ -1313,6 +1314,7 @@
                     /*
                      * Status code configured as service is down.
                      * Mark the node as bad.
+                     * Do not try to reuse the same node for the same request.
                      * Failing over to another node could help.
                      */
                     rec->s->errors++;
@@ -1325,7 +1327,9 @@
                     if (aw->s->reply_timeouts > 
(unsigned)p->worker->max_reply_timeouts) {
                         /*
                          * Service failed - to many reply timeouts
-                         * Take this node out of service.
+                         * Mark the node as bad.
+                         * Do not try to reuse the same node for the same 
request.
+                         * Failing over to another node could help.
                          */
                         rec->s->errors++;
                         rec->s->state = JK_LB_STATE_ERROR;
@@ -1334,20 +1338,21 @@
                     }
                     else {
                         /*
-                         * Put lb member into local error,
-                         * so that we will not use it during
-                         * fail over attempts.
+                         * Reply timeout, bot not yet too many of them.
+                         * Keep previous global state.
+                         * Do not try to reuse the same node for the same 
request.
+                         * Failing over to another node could help.
                          */
-                        rec->s->state  = JK_LB_STATE_OK;
                         p->states[rec->i] = JK_LB_STATE_ERROR;
-                        rec->s->error_time = 0;
                     }
                     rc = JK_FALSE;
                 }
                 else {
                     /*
-                     * Service failed !!!
-                     * Time for fault tolerance (if possible)...
+                     * Various unspecific error cases.
+                     * Keep previous global state, if we are not in local 
error since to long.
+                     * Do not try to reuse the same node for the same request.
+                     * Failing over to another node could help.
                      */
                     time_t now = time(NULL);
                     rec->s->errors++;



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to