Luca Martella created CXF-9173:
----------------------------------

             Summary: Default SO_LINGER induce TCP RST on each connection 
closure
                 Key: CXF-9173
                 URL: https://issues.apache.org/jira/browse/CXF-9173
             Project: CXF
          Issue Type: Bug
          Components: Transports
    Affects Versions: 4.1.2
            Reporter: Luca Martella


After migrating from CXF 3.5.11 to 4.1.2, we noticed that TCP connections 
managed by the async HTTP conduit are abruptly closed with a TCP reset when the 
connection time-to-live (CONNECTION_TTL) expires, resulting in Connection Reset 
errors on the remote side.
 

Both CXF versions allow customisation of key parameters via the CXF bus:
- CONNECTION_TTL (default: 60000 ms): Duration a connection remains open.
- SO_LINGER (default: -1): Controls socket linger time, affecting how 
connections are closed.
 

The main change in CXF 4.x is the unit for the SO_LINGER option that is now 
expected to be in milliseconds while it was expressed in seconds on older CXF 3:

1. The value from the CXF bus is interpreted as milliseconds when creating the 
IOReactorConfig (see 
[code|https://github.com/apache/cxf/blob/cxf-4.1.2/rt/transports/http-hc5/src/main/java/org/apache/cxf/transport/http/asyncclient/hc5/AsyncHTTPConduitFactory.java#L341])
{code:java}
final IOReactorConfig config = IOReactorConfig.custom()
            .setSoLinger(TimeValue.ofMilliseconds(soLinger)){code}
2. It is then converted back to seconds when the reactor consumes the config 
(see 
[code|https://github.com/apache/httpcomponents-core/blob/rel/v5.4-alpha1/httpcore5/src/main/java/org/apache/hc/core5/reactor/SingleCoreIOReactor.java#L296])
{code:java}
final int linger = this.reactorConfig.getSoLinger().toSecondsIntBound();
if (linger >= 0) {
    socket.setSoLinger(true, linger);
} {code}

The problem occurs when the default value (-1) for the SO_LINGER option is 
used. In CXF 4.x, this value is first interpreted as -1 milliseconds, then 
converted to 0 seconds (= 0 is the result of doing _toSecondsIntBound()_ on a 
TimeValue of -1 milliseconds).

As a result, the linger option is enabled with a timeout of 0, causing sockets 
to close immediately and trigger a TCP reset. 

That's definitively a difference in behaviour compared to CXF 3 version where 
the default SO_LINGER value -1 was meant to disable socket linger by default. 

1. Can you please clarify if this change was on purpose or if its a bug 
resulting from the various unit conversions?
2. We see setting org{_}.apache.cxf.transport.http.async.SO_LINGER{_} to -1000 
effectively disables the linger option, which aligns with the default behavior 
in CXF 3.x. Is this a valid workaround to prevent abrupt socket closures and 
TCP resets until the issue is clarified or resolved?

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to