Dmytro Shteflyuk created THRIFT-5942:
----------------------------------------
Summary: Incorrect connection timeout handling in {{TSocket}} /
{{TSSLSocket}}
Key: THRIFT-5942
URL: https://issues.apache.org/jira/browse/THRIFT-5942
Project: Thrift
Issue Type: Bug
Components: Ruby - Library
Reporter: Dmytro Shteflyuk
The Ruby transport has inconsistent and partially broken timeout behavior
during connection establishment.
Plain TCP open does not appear to hang forever, but it handles timeout and
cleanup poorly.
SSL open is worse: the handshake loop can retry indefinitely when the readiness
wait times out, so a caller-provided timeout is not reliably enforced.
This affects:
* {{lib/rb/lib/thrift/transport/socket.rb}}
* {{lib/rb/lib/thrift/transport/ssl_socket.rb}}
h2. Problem
The Ruby transport currently uses one code path for TCP connect and a separate
retry loop for SSL handshake, and both have issues.
h3. Plain TCP issues
In {{Socket#open}}:
* connect waits for writability after {{connect_nonblock}}
* if the wait call times out, the code falls through to the next address
candidate
* that timeout is not surfaced as {{TransportException::TIMED_OUT}}
* the failed candidate socket is not closed before trying the next address
* if all candidates fail through that path, the final error can be weak or
lose the real cause
* {{timeout == 0}} behaves like immediate timeout during open, which is
inconsistent with the existing read/write behavior where {{nil}} and {{0}} mean
blocking
Important note: plain TCP does *not* seem to loop forever in the same way as
SSL, but it still mishandles timeout semantics and cleanup.
h3. SSL issues
In {{SSLSocket#open}}:
* the code wraps the TCP socket in {{OpenSSL::SSL::SSLSocket}}
* it calls {{connect_nonblock}}
* on {{IO::WaitReadable}} / {{IO::WaitWritable}}, it does a readiness wait and
then unconditionally retries
This creates a real bug:
* if the readiness wait returns {{nil}} on timeout, the code still retries
* each retry effectively gets a fresh full timeout window
* repeated read/write wait states during handshake can therefore exceed the
user-supplied timeout indefinitely
--
This message was sent by Atlassian Jira
(v8.20.10#820010)