[
https://issues.apache.org/jira/browse/THRIFT-5942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmytro Shteflyuk updated THRIFT-5942:
-------------------------------------
Summary: Incorrect connection timeout handling in TSocket / TSSLSocket
(was: Incorrect connection timeout handling in {{TSocket}} / {{TSSLSocket}})
> Incorrect connection timeout handling in TSocket / TSSLSocket
> -------------------------------------------------------------
>
> Key: THRIFT-5942
> URL: https://issues.apache.org/jira/browse/THRIFT-5942
> Project: Thrift
> Issue Type: Bug
> Components: Ruby - Library
> Reporter: Dmytro Shteflyuk
> Priority: Major
>
> The Ruby transport has inconsistent and partially broken timeout behavior
> during connection establishment.
> Plain TCP open does not appear to hang forever, but it handles timeout and
> cleanup poorly.
> SSL open is worse: the handshake loop can retry indefinitely when the
> readiness wait times out, so a caller-provided timeout is not reliably
> enforced.
> This affects:
> * {{lib/rb/lib/thrift/transport/socket.rb}}
> * {{lib/rb/lib/thrift/transport/ssl_socket.rb}}
> h2. Problem
> The Ruby transport currently uses one code path for TCP connect and a
> separate retry loop for SSL handshake, and both have issues.
> h3. Plain TCP issues
> In {{Socket#open}}:
> * connect waits for writability after {{connect_nonblock}}
> * if the wait call times out, the code falls through to the next address
> candidate
> * that timeout is not surfaced as {{TransportException::TIMED_OUT}}
> * the failed candidate socket is not closed before trying the next address
> * if all candidates fail through that path, the final error can be weak or
> lose the real cause
> * {{timeout == 0}} behaves like immediate timeout during open, which is
> inconsistent with the existing read/write behavior where {{nil}} and {{0}}
> mean blocking
> Important note: plain TCP does *not* seem to loop forever in the same way as
> SSL, but it still mishandles timeout semantics and cleanup.
> h3. SSL issues
> In {{SSLSocket#open}}:
> * the code wraps the TCP socket in {{OpenSSL::SSL::SSLSocket}}
> * it calls {{connect_nonblock}}
> * on {{IO::WaitReadable}} / {{IO::WaitWritable}}, it does a readiness wait
> and then unconditionally retries
> This creates a real bug:
> * if the readiness wait returns {{nil}} on timeout, the code still retries
> * each retry effectively gets a fresh full timeout window
> * repeated read/write wait states during handshake can therefore exceed the
> user-supplied timeout indefinitely
--
This message was sent by Atlassian Jira
(v8.20.10#820010)