https://bz.apache.org/bugzilla/show_bug.cgi?id=64765

            Bug ID: 64765
           Summary: ThreadPoolExecutor#submittedCount wrong after undeploy
           Product: Tomcat 9
           Version: 9.0.38
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Util
          Assignee: dev@tomcat.apache.org
          Reporter: m...@bsi-software.com
  Target Milestone: -----

Created attachment 37468
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=37468&action=edit
screenshots

Steps to reproduce:
1. download Tomcat 9.0.38 (older versions are affected as well)
2. configure user for the manager application
3. start Tomcat
4. navigate to the manager web application and install a webapp having a
servlet that performs some long running tasks (e.g. 30 seconds for one request)
5. Open several browsers (because some browsers limit the number of connections
to the same host) and navigate to the installed webapp (e.g. query the servlet
20 times in parallel). This puts the worker threads of the internal thread pool
at load.
6. In the manager web application undeploy the webapp again while the 20
requests are still running.
7. After a few seconds the webapp is undeployed.
8. Redeploy the webapp again
9. Refresh the 20 web pages to put the worker threads at load again.

Observation after the redeploy: Some of the requests arrive at the servlet
immediately (as it should) and some only arrive after a long delay (e.g. 30
seconds).

Problem Analysis:
The Thread Pool uses a core pool size of 10 and a max pool size of 200. So
basically, there is space to create more worker threads.
However, the pool enqueues the request in the work queue without creating a new
worker for it. And as the other workers are all busy handling the 30 seconds
long requests, it stays in the queue until a worker finishes.
Expected behavior would be that a new worker thread is created and the request
is processed immediately.

The reason why no worker is created can be found in code line [1]: The queue
uses the submittedCount property of the pool to decide whether new workers are
required. Unfortunately, at this point the value returned is wrong (too low).
As a result, the runnable is just saved without creating more workers.

The submittedCount value of the pool is running out of sync during the undeploy
of the webapp. At a certain point (if the webapp is undeployed while there are
long running requests working) this number becomes a negative value.
The attached screenshots illustrate the moment in which the submittedCount
would become negative:
Screenshot 0 shows the two stack frames both standing at [2].
Screenshot 1 shows that at this moment the submittedCount property already has
the value 0 and would become negative right afterwards.
Screenshot 2 shows the position of worker "exec-11" currently executing
"afterExecute" for task with id=617
Screenshot 3 shows the position of worker "exec-13" currently executing
"afterExecute" for the same task but in the catch block (see [4]).
Screenshot 4 shows the position of worker "exec-13" again but indicating that
the exception is of type "StopPooledThreadException".

After all threads completed their work, submittedCount in the thread pool has a
value of e.g. -8.
If the webapp is deployed again and e.g. 10 worker threads are still in the
pool, the 10 first requests to the webapp are processed immediately as there
are workers available.
And as these requests are quite long, these 10 workers stay in use. The next
request would basically be the first that requires a new worker thread to be
created.
But as the submittedCount now has the value 2 (-8 after the deploy + 10 new
requests) the task queue will return at [1] putting the request in the waiting
queue.
The correct behavior would be to return false (two lines further down) which
would instruct the pool to create new workers.
The request will only be processed when the next 30 seconds request ends. In
this situation tomcat must be restarted because responsiveness drops rapidly.

Some thoughts:
- I am not sure about the internals but is correct that the "afterExecute" is
called several times in parallel for the same task as shown in the screenshots?
- In case the StopPooledThreadException is thrown at [3] it is caught from [4]
which executes "afterExecute" again as shown in screenshot 3. Is this correct?

In general, somehow should be ensured that the submittedCount of
"org.apache.tomcat.util.threads.ThreadPoolExecutor" cannot become negative.
Otherwise the whole thread pool is broken.


[1]
https://github.com/apache/tomcat/blob/9.0.38/java/org/apache/tomcat/util/threads/TaskQueue.java#L79
[2]
https://github.com/apache/tomcat/blob/9.0.38/java/org/apache/tomcat/util/threads/ThreadPoolExecutor.java#L95
[3]
https://github.com/apache/tomcat/blob/9.0.38/java/org/apache/tomcat/util/threads/ThreadPoolExecutor.java#L118
[4]
https://github.com/openjdk/jdk/blob/jdk-11%2B7/src/java.base/share/classes/java/util/concurrent/ThreadPoolExecutor.java#L1131

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to