Feature request /Discussion: JK loadbalancer improvements for high load

2007-07-04 Thread Yefym Dmukh
Hi all,
sorry for the stress but it seems that it is a time to come back to  the 
discussion related to the load balancing for JVM (Tomcat).

Prehistory:
Recently we made benchmark and smoke tests of our product at the sun high 
tech centre in Langen (Germany). 
 
As the webserver apache2.2.4 has been used, container -10xTomcat 5.5.25 
and as load balancer - JK connector 1.2.23 with busyness algorithm. 

Under the high load the strange behaviour was  observed: some 
tomcat workers temporary got the non-proportional load, often 10 times 
higher then the others for the relatively long periods.  As the result the 
response times that usually stay under 500ms went up to 20+ sec, that in 
its turn  made the overall test results almost two time worst as 
estimated. 

At the beginning we were quite confused, because we were 
sure that it was not the problem of JVM configuration and supposed that 
the reason is in LB logic of mod_jk, and the both suggestions were right. 

Actually the following was happening: the LB sends requests and gets the 
session sticky, continuously sending the upcoming requests to the same 
cluster node. At the certain period of time the JVM started the major 
garbage collection (full gc) and spent, mentioned above, 20 seconds. At 
the same time jk continued to send new requests and the sticky to node 
requests that led us to the situation where the one node broke the SLA on 
response times. 

I ^ve been searching the web for awhile to find the LoadBalancer 
implementation that takes an account the GC activity and reduces the load 
accordingly case JVM is close to the major collection, but nothing found.

Once again the LB of JVMs under the load is really an issue for production 
and with optimally distributed load you are able not only to lower the 
costs, but also able to prevent bad customer experience, not to mention 
broken SLAs. 

Feature request:

All lb algorithms have to be extended with the bidirectional 
connection with jvm:
 Jvm -> Lb: old gen size and the current occupancy
 Lb -> Jvm: prevent node overload and advice gc on dependent on 
parameterized free old gen space in %. 
 

All the ideas and comments are appreciated.

Regards,
Yefym.

Re: Feature request /Discussion: JK loadbalancer improvements for high load

2007-07-05 Thread Yefym Dmukh
Hi Rainer,
seems we have still an issue in jvm config. we have not noticed the 
concurrent mode gc failures that led to tomcat overload and bad responses.

14896.148: [Full GC 14896.148: [CMS (concurrent mode failure): 
1102956K->326641K(2359296K), 4.0684463 secs] 1280549K->326641K(3067136K), 
[CMS Perm : 131071K->62120K(131072K)], 4.0690589 secs]

15346.161: [Full GC 15346.161: [CMS (concurrent mode failure): 
1546160K->324956K(2359296K), 3.4321099 secs] 1637284K->324956K(3067136K), 
[CMS Perm : 131071K->6K(131072K)], 3.4327402 secs]



 Unfortunetely the concurrent algorithms in java need itself 30%-50% of 
memory, so if the jvm still allocates the objs in tenured and there is not 
enough   memory for concurrent gc stuff - it makes stop-the-world. Anyway 
these events are fired in JVM and would be really really nice if the 
load-balancer could be smart enough to react on it , e.g. redirecting the 
load. 

Agreed, that this problem is not easy to solve and may be  requires some 
features from jvm side that are not implemented yet. 


Regards,
Y.

--------
Yefym Dmukh
InterComponentWare AG
R&D Personal Health Record
Industriestraße 41
69190 Walldorf (Baden)
Germany 

Phone:  +49 6227 385 122
Fax:+49 6227 385 588
Mail:   [EMAIL PROTECTED]
--
** www.icw.de** 
**   www.LifeSensor.com **

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
IHE-konform und hoch skalierbar:

Der ICW Master Patient Index hat erfolgreich am IHE Connectathon
in Berlin teilgenommen und im HP European Performance und Benchmark
Center in Sekundenbruchteilen Patientendaten aus einer Datenbank mit
100 Millionen Datensätzen gefunden.
Mehr Informationen zu den innovativen Vernetzungslösungen
der ICW finden Sie unter http://www.icw.de
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

InterComponentWare AG: 
Vorstand: Peter Reuschel (Vors.), Norbert Olsacher, Dr. med. Frank Warda / 
Aufsichtsratsvors.: Michael Kranich
Firmensitz: 69190 Walldorf, Industriestr. 41 / AG Mannheim HRB 351761 / 
USt.-IdNr.: DE 198388516


Re: Feature request /Discussion: JK loadbalancer improvements for high load

2007-07-05 Thread Yefym Dmukh
Hi Rainer,
seems we have still an issue in jvm config. we have not noticed the 
concurrent mode gc failures that led to tomcat overload and bad responses.

14896.148: [Full GC 14896.148: [CMS (concurrent mode failure): 
1102956K->326641K(2359296K), 4.0684463 secs] 1280549K->326641K(3067136K), 
[CMS Perm : 131071K->62120K(131072K)], 4.0690589 secs]

15346.161: [Full GC 15346.161: [CMS (concurrent mode failure): 
1546160K->324956K(2359296K), 3.4321099 secs] 1637284K->324956K(3067136K), 
[CMS Perm : 131071K->6K(131072K)], 3.4327402 secs]



 Unfortunetely the concurrent algorithms in java need itself 30%-50% of 
memory, so if the jvm still allocates the objs in tenured and there is not 
enough   memory for concurrent gc stuff - it makes stop-the-world. Anyway 
these events are fired in JVM and would be really really nice if the 
load-balancer could be smart enough to react on it , e.g. redirecting the 
load. 

Agreed, that this problem is not easy to solve and may be  requires some 
features from jvm side that are not implemented yet. 


Regards,
Y.

--------
Yefym Dmukh
InterComponentWare AG
R&D Personal Health Record
Industriestraße 41
69190 Walldorf (Baden)
Germany 

Phone:  +49 6227 385 122
Fax:+49 6227 385 588
Mail:   [EMAIL PROTECTED]
--
** www.icw.de** 
**   www.LifeSensor.com **

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
IHE-konform und hoch skalierbar:

Der ICW Master Patient Index hat erfolgreich am IHE Connectathon
in Berlin teilgenommen und im HP European Performance und Benchmark
Center in Sekundenbruchteilen Patientendaten aus einer Datenbank mit
100 Millionen Datensätzen gefunden.
Mehr Informationen zu den innovativen Vernetzungslösungen
der ICW finden Sie unter http://www.icw.de
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

InterComponentWare AG: 
Vorstand: Peter Reuschel (Vors.), Norbert Olsacher, Dr. med. Frank Warda / 
Aufsichtsratsvors.: Michael Kranich
Firmensitz: 69190 Walldorf, Industriestr. 41 / AG Mannheim HRB 351761 / 
USt.-IdNr.: DE 198388516


Re: Feature request /Discussion: JK loadbalancer improvements for high load

2007-07-05 Thread Yefym Dmukh
>You have oxymoron here. With session stickiness you are willingly
>tear down the load balancer correctness because you don't wish/can
>have session replication.
>Even with the smartest LB and even with the two way communication
>it's only possible to make that working in non session stickyness
>topology. In other case you would loose sessions on each GC cycle.
>However like Rainer said the solution is to choose
>the appropriate GC strategy for web based application.

Generally you are right, but the ideal world is not the reality:
we use apache my faces implementation of jsf where the core session class 
size is about 500KB, the compression of state kills CPU, the size kills 
session replication/failover approach. So we have is what we have aqnd we 
are trying to get the best out of it. 

BTW, what about the bidirectional jvm-lb connection and the stop-the-world 
GC managed by lb, as keep it simple approach ? 






Mladen Turk <[EMAIL PROTECTED]> 
05.07.2007 13:19
Please respond to
"Tomcat Developers List" 


To
Tomcat Developers List 
cc

Subject
Re: Feature request /Discussion: JK loadbalancer improvements for high 
load






Yefym Dmukh wrote:
> 
> Actually the following was happening: the LB sends requests and gets the 

> session sticky, continuously sending the upcoming requests to the same 
> cluster node. At the certain period of time the JVM started the major 
> garbage collection (full gc) and spent, mentioned above, 20 seconds. At 
> the same time jk continued to send new requests and the sticky to node 
> requests that led us to the situation where the one node broke the SLA 
on 
> response times. 
> 

You have oxymoron here. With session stickiness you are willingly
tear down the load balancer correctness because you don't wish/can
have session replication.
Even with the smartest LB and even with the two way communication
it's only possible to make that working in non session stickyness
topology. In other case you would loose sessions on each GC cycle.
However like Rainer said the solution is to choose
the appropriate GC strategy for web based application.

Regards,
Mladen.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Feature request /Discussion: JK loadbalancer improvements for high load

2007-07-05 Thread Yefym Dmukh
Mladen wrote:
>This won't help much. The sticky session requests must be served
>by the same node (group of nodes if using domain clustering),
>and your requests will still be delayed by the JVM instance GC cycle
>(It has to happen sometime, and you cannot depend on request
>void intervals)

The main point was to use the stop-the-world collectors PROACTIVELY 
managed by lb in order to:
eliminate the memory and cpu usage overhead ConcurrentXXX 
ParallelXXX (i.e. up to 50% for mem ) collectors..
prevent the overloading of the single node that falls in GC-cycle. 
 
reduce the complexity of jvm tunning

With an algorithm:
heap is almost full -> do not send the requests to the node 
anymore and load the other nodes (wait some timeto give the chance 
for the current workers to do the jobs), advise stop-the-world. Heap free? 
Resume the load.


Certainly this makes sense (if any) only for the setups with many small 
jvms that have small CPU (or limited for the current needs) and huge Mem 
size for the stateless applications or the applications with the short 
session lifecycle.

Also may be it would make sense for the domained installations, in GC 
cycles the other domain members will takeover the jobs. 
all this makes sense only in case if the load balancer manages not only 
the load but also GC cycles: kind of ideal JVMs level load-balancer for 
the high load.

Regards,
Y.

Re: Feature request /Discussion: JK loadbalancer improvements for high load

2007-07-05 Thread Yefym Dmukh
>Question, why would you need GC,heap or CPU stats from the Tomcat 
>machine at all?
>in most cases, none of these stats can predict response times.
>The best way would be to simply record a history of response times, and 
>then mod_jk can have algorithms (maybe even pluggable) on how to use 
>those stats for future request.

As for me the Busyness algorithm in mod_jk is almost perfect as well as 
idea as implementation , 
the only weak point is that it balances the jvms on the background.  The 
point is that Jvm has such an artifact as GC which under the circumstances 
makes deny of service for awhile :) and loadbalancer notices is too late, 
even worse because of any reasons (stickness or internal logic with jobs 
queue) the lb doesn^t react accordingly at the right time.

All other issues like threading and cpu are not from me. To my opinion 
they are useless. I was talking only about JVM loadbalancing which reacts 
accordingly on JVM specific things.