[ 
https://issues.apache.org/jira/browse/THRIFT-5942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Shteflyuk updated THRIFT-5942:
-------------------------------------
    Summary: Incorrect connection timeout handling in TSocket / TSSLSocket  
(was: Incorrect connection timeout handling in {{TSocket}} / {{TSSLSocket}})

> Incorrect connection timeout handling in TSocket / TSSLSocket
> -------------------------------------------------------------
>
>                 Key: THRIFT-5942
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5942
>             Project: Thrift
>          Issue Type: Bug
>          Components: Ruby - Library
>            Reporter: Dmytro Shteflyuk
>            Priority: Major
>
> The Ruby transport has inconsistent and partially broken timeout behavior 
> during connection establishment.
> Plain TCP open does not appear to hang forever, but it handles timeout and 
> cleanup poorly.
> SSL open is worse: the handshake loop can retry indefinitely when the 
> readiness wait times out, so a caller-provided timeout is not reliably 
> enforced.
> This affects:
>  * {{lib/rb/lib/thrift/transport/socket.rb}}
>  * {{lib/rb/lib/thrift/transport/ssl_socket.rb}}
> h2. Problem
> The Ruby transport currently uses one code path for TCP connect and a 
> separate retry loop for SSL handshake, and both have issues.
> h3. Plain TCP issues
> In {{Socket#open}}:
>  * connect waits for writability after {{connect_nonblock}}
>  * if the wait call times out, the code falls through to the next address 
> candidate
>  * that timeout is not surfaced as {{TransportException::TIMED_OUT}}
>  * the failed candidate socket is not closed before trying the next address
>  * if all candidates fail through that path, the final error can be weak or 
> lose the real cause
>  * {{timeout == 0}} behaves like immediate timeout during open, which is 
> inconsistent with the existing read/write behavior where {{nil}} and {{0}} 
> mean blocking
> Important note: plain TCP does *not* seem to loop forever in the same way as 
> SSL, but it still mishandles timeout semantics and cleanup.
> h3. SSL issues
> In {{SSLSocket#open}}:
>  * the code wraps the TCP socket in {{OpenSSL::SSL::SSLSocket}}
>  * it calls {{connect_nonblock}}
>  * on {{IO::WaitReadable}} / {{IO::WaitWritable}}, it does a readiness wait 
> and then unconditionally retries
> This creates a real bug:
>  * if the readiness wait returns {{nil}} on timeout, the code still retries
>  * each retry effectively gets a fresh full timeout window
>  * repeated read/write wait states during handshake can therefore exceed the 
> user-supplied timeout indefinitely



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to