On 07.03.2009 09:00, mt...@apache.org wrote:
Author: mturk
Date: Sat Mar 7 08:00:54 2009
New Revision: 751217
URL: http://svn.apache.org/viewvc?rev=751217&view=rev
Log:
If the number of the channels in error more then half of the busy channels
mark the worker global status as error
Modified:
tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c
tomcat/connectors/trunk/jk/native/common/jk_lb_worker.c
tomcat/connectors/trunk/jk/native/common/jk_shm.h
Modified: tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c
URL:
http://svn.apache.org/viewvc/tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c?rev=751217&r1=751216&r2=751217&view=diff
==============================================================================
--- tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c (original)
+++ tomcat/connectors/trunk/jk/native/common/jk_ajp_common.c Sat Mar 7
08:00:54 2009
@@ -2120,6 +2120,8 @@
aw->s->transferred += e->wr;
if (aw->s->busy)
aw->s->busy--;
+ if (aw->s->in_error)
+ aw->s->in_error--;
if (rc == JK_TRUE) {
aw->s->state = JK_AJP_STATE_OK;
}
@@ -2130,6 +2132,7 @@
else {
aw->s->state = JK_AJP_STATE_ERROR;
aw->s->errors++;
+ aw->s->in_error++;
aw->s->error_time = time(NULL);
}
}
I think this can't possibly work. We decrement once for each ended
request (error or not) and we increment after the request if it returned
with an error.
So if
- the value is zero and we have an error we end up with value 1
- the value is one and we have no error we end up with zero
- the value is one and we have another error we end up again with
1-1+1=1 (not 2).
Therefore we will never exceed the value 1 as in_error here.
Plan B: Counting error excess
We could change it to decrement only in the cases JK_TRUE or
JK_CLIENT_ERROR, but then we would count the excess of errors over OK
requests. So whenever more OKs follow than errors, the counter would go
to 0 again. The problem with this counter is, that I can not see any
good criterion how to decide about the global node state. The excess
could only be related to the load, e.g. relative to how many requests
per time we were handling recently. That's another number we don't have.
Something along:
- add a special request counter x
- handle in_errors as described in B)
- reset x to zero whenever the in_errors is zero
- increment x by one for each request as long as in_errors is positive
- in lb "else" choose global error if in_error is bigger than N% of x
(e.g. N=10 or N=50). But wait, we need some correction for small
in_errors, like in_errors=x=1.
Plan C: Counting error endpoints (approximation of busy errors)
Each endpoint could remember whether it last had an error or not. Then
after the request it would
- increment in_errors, if it went from OK to error
- decrement in_errors, if it went from error to OK
- keep in_errors same otherwise
But still, since there is no fair usage distribution over the endpoints,
this will not give a useful number (lots of OK requests could use the
same endpoint and all the error requests could be distributed over many
different endpoints or vice versa).
The problem comes from the fact, that busy is a snapshot number, and
there is no way to tell, how many of the requests being on the fly, will
return with an error.
Summary:
I still like the idea of using the error_time. Each OK request will
reset it, and that's fine. As long as there's something good coming back
we have a global chance. But if there are no OK's for some time we
should switch to global ERROR.
After 10 seconds or after 60 seconds: I think 60 seconds is pretty long,
but I would accept as a compromise :)
Regards,
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org