On 02/21/11 10:21, Mark Thomas wrote:
> The ASF Sonar installation managed to generate 46GB of identical log
> messages [1] today in the 8 hours it took to notice it was down.
Continuing to drive down the cost of disk storage :-)
> While better monitoring would/should have identified the problem sooner,
> it does demonstrate a problem with the acceptor threads in all three
> endpoints. If there is a system-level issue that causes the accept()
> call to always fail (such as hitting the ulimit) then the endpoint
> essentially enters a loop where it logs an error message for every
> iteration of the loop. This will result in many log messages per second.
>
> I'd like to do something about this. I was thinking of something along
> the lines of the following for each endpoint.
>
> Index: java/org/apache/tomcat/util/net/JIoEndpoint.java
> ===================================================================
> --- java/org/apache/tomcat/util/net/JIoEndpoint.java (revision 1072939)
> +++ java/org/apache/tomcat/util/net/JIoEndpoint.java (working copy)
> @@ -183,9 +183,19 @@
> @Override
> public void run() {
>
> + int errorDelay = 0;
> +
> // Loop until we receive a shutdown command
> while (running) {
>
> + if (errorDelay > 0) {
> + try {
> + Thread.sleep(errorDelay);
> + } catch (InterruptedException e) {
> + // Ignore
> + }
> + }
> +
> // Loop if endpoint is paused
> while (paused && running) {
> try {
> @@ -225,9 +235,15 @@
> // Ignore
> }
> }
> + errorDelay = 0;
> } catch (IOException x) {
> if (running) {
> log.error(sm.getString("endpoint.accept.fail"), x);
> + if (errorDelay == 0) {
> + errorDelay = 50;
> + } else if (errorDelay < 1600) {
> + errorDelay = errorDelay * 2;
> + }
> }
> } catch (NullPointerException npe) {
> if (running) {
>
>
>
> Thoughts / comments?
+1 - a bit of smarts in reducing redundant logging is usually a good thing.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]