> - Why does the stack overflow when the JVM is loaded from Octave, and
>   not when the JVM is launched "normally"?  (What is that worker
>   thread doing that would cause it to use any appreciable amount of
>   stack space?)
>
> - Why, on bullseye, is it unable to create the worker thread in the
>   first place?  (The error message suggests that pthread_create
>   doesn't like the specified attributes for some reason - why is
>   that?)  And again, why only in Octave?

It appears that the answers to both questions are the same: because
Octave uses a (relatively speaking) huge thread-local storage area,
and, at least when using glibc, this size must be included in the
"thread stack size" that is requested when calling pthread_create.

If I use gdb to break on pthread_create:

- when running octave 5.2.0 on bullseye: __static_tls_size = 156736

- when running octave 4.4.1 on buster: __static_tls_size = 95296

- when running a simple Java test program: __static_tls_size = 4160

which would seem to explain all of these issues.

(OpenJDK 8 seems to use a somewhat larger stack size by default than
OpenJDK 11 does - 233472 vs 139264 bytes.  I'm not sure why this is
different from the stackSize passed to the Thread constructor, but it
is.)

So should this be considered a bug in OpenJDK, or in Octave?

I don't think it's reasonable to expect Octave to limit its use of
thread-local storage in order to fit an arbitrary undocumented limit
that was picked out of thin air by the Java developers.  On the other
hand, it's not entirely unreasonable for Java to attempt to avoid
allocating a huge stack when it creates an internal worker thread that
doesn't need one.

(At the same time, I do think it is wrong for OpenJDK not to account
for the possibility of failure in this case.  If the reaper thread
crashes, that should cause waitFor to fail with an exception, but not
to hang.)

I don't know whether pthreads has any way to create a thread *without*
giving it a copy of the global static TLS area, and I don't know
whether that would work at all even if it were possible.  I also don't
know if there's any remotely portable way for a program to find out
the size of its own static TLS area.  So I have no idea how difficult
it would be to fix this bug at the OpenJDK level.

Certainly it seems that the simplest fix would be for Octave to set
the jdk.lang.processReaperUseDefaultStackSize property.

Reply via email to