On 08/02/2024 17:07, Mark Thomas wrote:
Hi all,

TL;DR tagging likely delayed while APR/native stability issue is addressed

We have had a couple of issues with test stability in the last few days.

The issues with 11.0.x and 10.1.x were caused by the incomplete convenience binary for Tomcat Native 2.0.7. That should be resolved now. The 11.0.x tests are already green and I am expecting 10.1.x to be green for the next run.

9.0.x and 8.5.x are a little more interesting. The instability was triggered by the change to allow users to provide an SSLContext directly to SSLHostConfigCertificate. This changed the timing of endpoint destruction enough to make the intermittent APR crashes much more frequent - almost on every run.

The good news is that the more frequent crashes made it easier to investigate. My current theory is related to the cleanup of OpenSSLContext. In 9.0.x and 8.5.x clean-up of this object is performed by a finalizer. This is to support runtime replacement of the SSLHostContext.

What I think happens is:
- Tomcat starts shutdown
- Endpoint is destroyed
- AprLifecycleListener shuts down Native library
- finalizer runs and tries to reference native code leading to a crash

I have some initial ideas on how we might handle this better. The quick and dirty fix was to force GC and add a delay in AprLifecycleListener.terminateAPR() but that was just a hack to test the theory.

Back to working out a more robust fix...

While the fix worked well locally, it hasn't fixed the problem on the Buildbot CI worker.

I'm going to take another look.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to