https://bz.apache.org/bugzilla/show_bug.cgi?id=67938

            Bug ID: 67938
           Summary: Tomcat mishandles large client hello messages
           Product: Tomcat 10
           Version: 10.1.15
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Connectors
          Assignee: dev@tomcat.apache.org
          Reporter: aogb...@redhat.com
  Target Milestone: ------

A java client application running previously with java 11 began seeing
handshake failures with Tomcat 10.1 when the client app moved to java 17. 
OpenJDK engineers reviewed and based on the evidence gathered so far and after
a static code analysis, we think that there is a problem in how Apache Tomcat
handles TLS handshakes containing large Client Hello packets. We know that
versions 10.1.9 to 10.1.15 are affected, but have not looked into other major
releases.

What follows is a high-level overview of the events that are happening, in our
understanding, when the failure manifests:

1) The TLS client sends a Client Hello packet to resume a TLS 1.3 session. The
packet is so large (26,660 bytes) that it has to be split into 2 TLS record
messages. This splitting occurs at the TLS level, above any possible TCP
fragmentation. The first TLS record has a length of 16,372 bytes and the second
a length of 10,298 bytes (5 bytes of each TLS record are for the header, and
the rest accounts for the Client Hello payload).

2) The method org.apache.tomcat.util.SecureNioChannel::handshake handles the
incoming connection, on the TLS server side [1]. In particular,
org.apache.tomcat.util.SecureNioChannel::processSNI is called first to peek at
the incoming data and check, for example, if the SNI TLS extension is present
[2].

3) The most relevant outcomes of the
org.apache.tomcat.util.SecureNioChannel::processSNI call are:
 3.1) The SNI TLS extension is not present. This was probably decided here [3]
because the Client Hello didn't fit into a single TLS record. SNI was not
present anyways.
 3.2) A new SSLEngine instance is created for the incoming connection.
 3.3) The netInBuffer ByteBuffer is filled with bytes from the first TLS record
sent by the client, and might include some but not all the bytes from the
second TLS record. This is because netInBuffer is initialized to a default size
of 16,921 bytes, and both TLS records total 26,670 bytes. netInBuffer is
expanded to sslEngine.getSession().getPacketBufferSize() after a read from the
network [4] but in practice, because there was no data passed to the SSLEngine
yet, this is probably 16,709 bytes (max record size, taken from
SSLRecord.maxRecordSize). Expanding to a smaller length has no effect. As a
result, netInBuffer has a likely size of 16,921 bytes and is completely full of
data.
 3.4) netInBuffer is assumed to be in a write-ready state at this point, which
means that position is set to the end of the filled data, limit is set to
capacity, and more bytes can be appended. However, if it's completely full as
assumed in #3.3, position would then be equal to limit (which is, in turn,
equal to capacity) and more bytes cannot be appended.

4) When returning from org.apache.tomcat.util.SecureNioChannel::processSNI to
org.apache.tomcat.util.SecureNioChannel::handshake, the field sniComplete is
set to true reflecting that no further calls to ::processSNI are needed for
this connection. Execution moves to
org.apache.tomcat.util.SecureNioChannel::handshakeUnwrap because the initial
state for a SSLEngine is NEED_UNWRAP [5].

5) Once in org.apache.tomcat.util.SecureNioChannel::handshakeUnwrap, the
"netInBuffer.position() == netInBuffer.limit()" condition evaluates to true [6]
and the ByteBuffer::clear method is called on netInBuffer. Position is set to 0
and limit to capacity. As a result, any write to netInBuffer will overwrite
unprocessed data. This unprocessed data is the first TLS record and part of the
second TLS record, depending on how much is written.

6) More bytes are read into netInBuffer here [7]. Bytes read are probably the
remainder of the second TLS record —we know that it's after the TLS record
header and that it's at least 5 bytes long—, and the overwrite occurs as
anticipated in #5. Data in netInBuffer is now corrupt.

7) The netInBuffer buffer is flipped to a read-ready state [8]. Thus, limit is
set to the last position after the overwrite and position is set to 0.

8) netInBuffer is passed to the SSLEngine for unwrapping. The SSLEngine finds
data at the beginning of the buffer that does not correspond to the beginning
of a TLS record, and fails throwing the exception shown in the server log.

We think that this error may not show up consistently due to network/OS timing
conditions. Different JDK releases, server configurations and TLS protocol
versions may also affect the length of the Client Hello message and have an
impact on reproducibility. The reason why Client Hello messages for resumption
are large in the analyzed client application case with OpenJDK 17 is because a
large resumption ticket is passed, but large messages (spanning multiple TLS
records) are compliant with the standard and should be handled appropriately. 
The following backport for OpenJDK 17 is also being pursued to reduce the
message size in this case.  Work arounds in this particular case have included
keeping the java client app on java 11, limiting the client app to TLSv1.2, or
setting "jdk.tls.client.enableSessionTicketExtension=false" on the client. 
Nonetheless, it looks like a flaw to address here in Tomcat for large client
hello messages whether from some circumstance like above or something else.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to