On 07.03.2009 09:00, mt...@apache.org wrote:
Author: mturk
Date: Sat Mar  7 08:00:54 2009
New Revision: 751217

URL: http://svn.apache.org/viewvc?rev=751217&view=rev
Log:
If the number of the channels in error more then half of the busy channels
mark the worker global status as error

Modified:
     tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c
     tomcat/connectors/trunk/jk/native/common/jk_lb_worker.c
     tomcat/connectors/trunk/jk/native/common/jk_shm.h

Modified: tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c
URL: 
http://svn.apache.org/viewvc/tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c?rev=751217&r1=751216&r2=751217&view=diff
==============================================================================
--- tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c (original)
+++ tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c Sat Mar  7 
08:00:54 2009
@@ -2120,6 +2120,8 @@
      aw->s->transferred += e->wr;
      if (aw->s->busy)
          aw->s->busy--;
+    if (aw->s->in_error)
+        aw->s->in_error--;
      if (rc == JK_TRUE) {
          aw->s->state = JK_AJP_STATE_OK;
      }
@@ -2130,6 +2132,7 @@
      else {
          aw->s->state = JK_AJP_STATE_ERROR;
          aw->s->errors++;
+        aw->s->in_error++;
          aw->s->error_time = time(NULL);
      }
  }

I think this can't possibly work. We decrement once for each ended request (error or not) and we increment after the request if it returned with an error.

So if

- the value is zero and we have an error we end up with value 1
- the value is one and we have no error we end up with zero
- the value is one and we have another error we end up again with 1-1+1=1 (not 2).

Therefore we will never exceed the value 1 as in_error here.

Plan B: Counting error excess

We could change it to decrement only in the cases JK_TRUE or JK_CLIENT_ERROR, but then we would count the excess of errors over OK requests. So whenever more OKs follow than errors, the counter would go to 0 again. The problem with this counter is, that I can not see any good criterion how to decide about the global node state. The excess could only be related to the load, e.g. relative to how many requests per time we were handling recently. That's another number we don't have.

Something along:

- add a special request counter x
- handle in_errors as described in B)
- reset x to zero whenever the in_errors is zero
- increment x by one for each request as long as in_errors is positive
- in lb "else" choose global error if in_error is bigger than N% of x (e.g. N=10 or N=50). But wait, we need some correction for small in_errors, like in_errors=x=1.

Plan C: Counting error endpoints (approximation of busy errors)

Each endpoint could remember whether it last had an error or not. Then after the request it would

- increment in_errors, if it went from OK to error
- decrement in_errors, if it went from error to OK
- keep in_errors same otherwise

But still, since there is no fair usage distribution over the endpoints, this will not give a useful number (lots of OK requests could use the same endpoint and all the error requests could be distributed over many different endpoints or vice versa).

The problem comes from the fact, that busy is a snapshot number, and there is no way to tell, how many of the requests being on the fly, will return with an error.

Summary:

I still like the idea of using the error_time. Each OK request will reset it, and that's fine. As long as there's something good coming back we have a global chance. But if there are no OK's for some time we should switch to global ERROR.

After 10 seconds or after 60 seconds: I think 60 seconds is pretty long, but I would accept as a compromise :)

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to