Dmytro Shteflyuk created THRIFT-5942:
----------------------------------------

             Summary: Incorrect connection timeout handling in {{TSocket}} / 
{{TSSLSocket}}
                 Key: THRIFT-5942
                 URL: https://issues.apache.org/jira/browse/THRIFT-5942
             Project: Thrift
          Issue Type: Bug
          Components: Ruby - Library
            Reporter: Dmytro Shteflyuk


The Ruby transport has inconsistent and partially broken timeout behavior 
during connection establishment.

Plain TCP open does not appear to hang forever, but it handles timeout and 
cleanup poorly.

SSL open is worse: the handshake loop can retry indefinitely when the readiness 
wait times out, so a caller-provided timeout is not reliably enforced.

This affects:
 * {{lib/rb/lib/thrift/transport/socket.rb}}
 * {{lib/rb/lib/thrift/transport/ssl_socket.rb}}

h2. Problem

The Ruby transport currently uses one code path for TCP connect and a separate 
retry loop for SSL handshake, and both have issues.
h3. Plain TCP issues

In {{Socket#open}}:
 * connect waits for writability after {{connect_nonblock}}
 * if the wait call times out, the code falls through to the next address 
candidate
 * that timeout is not surfaced as {{TransportException::TIMED_OUT}}
 * the failed candidate socket is not closed before trying the next address
 * if all candidates fail through that path, the final error can be weak or 
lose the real cause
 * {{timeout == 0}} behaves like immediate timeout during open, which is 
inconsistent with the existing read/write behavior where {{nil}} and {{0}} mean 
blocking

Important note: plain TCP does *not* seem to loop forever in the same way as 
SSL, but it still mishandles timeout semantics and cleanup.
h3. SSL issues

In {{SSLSocket#open}}:
 * the code wraps the TCP socket in {{OpenSSL::SSL::SSLSocket}}
 * it calls {{connect_nonblock}}
 * on {{IO::WaitReadable}} / {{IO::WaitWritable}}, it does a readiness wait and 
then unconditionally retries

This creates a real bug:
 * if the readiness wait returns {{nil}} on timeout, the code still retries
 * each retry effectively gets a fresh full timeout window
 * repeated read/write wait states during handshake can therefore exceed the 
user-supplied timeout indefinitely



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to