https://bz.apache.org/bugzilla/show_bug.cgi?id=60461

--- Comment #23 from Christopher Schultz <ch...@christopherschultz.net> ---
Some initial investigation based upon the initial bug report + crash dump.

1. The call from Java is AprSSLSupport.getCipherSuite
2. AprSSLSupport.getCipherSuite calls SSLSocket.getInfoS [native]
... with arg values [socketRef, a pointer to a tcn_socket_t, Java-long]
                and [SSL.SSL_INFO_CIPHER, value=0x02 Java-int]
3. The native call, for SSL_INFO_CIPHER should execute the following code
(compiled without DEBUG):

TCN_IMPLEMENT_CALL(jstring, SSLSocket, getInfoS)(JNIEnv *e, jobject o,
                                                 jlong sock, jint what)
{
    tcn_socket_t   *a = J2P(sock, tcn_socket_t *);
    tcn_ssl_conn_t *s;
    jstring value = NULL;
    apr_status_t rv = APR_SUCCESS;

    (o)=(o); // Don't complain that we aren't referencing a function arg

    s = (tcn_ssl_conn_t *)(a->opaque);

    if(what == SSL_INFO_CIPHER) // this is actually a switch case
        value = tcn_new_string(e, SSL_get_cipher_name(s->ssl));

    if (rv != APR_SUCCESS) // Doesn't happen
        tcn_ThrowAPRException(e, rv);

    return value;
}

There really aren't many opportunities for a crash, here:

1. Without DEBUG enabled, the value of "sock" isn't tested to be non-NULL after
the (o)=(o) line, so theoretically the local 'a' could be NULL
2. The value of 's' is never checked against NULL, and could segfault when
dereferencing s->ssl

The dump says the fault is in *this* function, and e.g. not a function called
by it.
SSL_get_cipher_name returns NULL if there isn't any SSL session.[1]
The tcn_new_string function returns NULL if the argument is NULL, so it's safe
to call without checking the return value of SSL_get_cipher_name.

In the Windows dump, the referenced address is spelled-out: "reading address
0x0000000000000130". That's not "NULL".
In the Linux dump, the referenced address is not directly shown.

The Linux dump tells us where in the function we are:
    Java_org_apache_tomcat_jni_SSLSocket_getInfoS+0x19

That's 25 bytes into the function, which is "not very far". I'm not entirely
sure if the instructions for setting-up the stack for local variables are
included in that 0x19, but if they are, then it's possible that we are talking
about the very beginning of the function, here.

Here's the gdb disassembly of the beginning of the function:

(gdb) disassemble Java_org_apache_tomcat_jni_SSLSocket_getInfoS
Dump of assembler code for function
Java_org_apache_tomcat_jni_SSLSocket_getInfoS:
   0x00000000000200f0 <+0>:     mov    QWORD PTR [rsp-0x30],rbx
   0x00000000000200f5 <+5>:     mov    QWORD PTR [rsp-0x20],r12
   0x00000000000200fa <+10>:    mov    ebx,ecx
   0x00000000000200fc <+12>:    mov    QWORD PTR [rsp-0x18],r13
   0x0000000000020101 <+17>:    mov    QWORD PTR [rsp-0x28],rbp
   0x0000000000020106 <+22>:    mov    r12,rdi
   0x0000000000020109 <+25>:    mov    QWORD PTR [rsp-0x10],r14    <---- crash
here?!
   0x000000000002010e <+30>:    mov    QWORD PTR [rsp-0x8],r15

At this point, no accesses have taken place other than things on the stack.
Basically, we haven't actually done anything, yet.

If these really are two reports of the same bug, then the Windows report tells
us we are trying to read from a bad location. Everything up through
getInfoS+0x19 reads from registers, which should never cause a SIGSEGV.

:(

I'm trying to make some sense of the stacks in these dumps but ... they aren't
making any sense to me... things like stack frames that are thousands of bytes
large, missing data (0x2 has to be in there somewhere!) but I'll have to save
that for another day.

[1] https://www.openssl.org/docs/man1.1.0/ssl/SSL_get_cipher_name.html

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to