Hi all,

I believe I found a server bug. If a client connects from unreliable network, 
server may lock up. In that state, it does not accept new connections, it 
cannot be stopped, and if connection improves, it does not become responsive 
again. The problem is confirmed on Raspbian and macOS so far.

I debugged it and found the root cause. The code below makes a socket blocking 
temporarily while writing into it. Then it reverts it to non-blocking only if 
operation succeeds, leaving it blocking if it fails:

liveMedia/RTPInterface.cpp:398:
```
      makeSocketBlocking(socketNum, RTPINTERFACE_BLOCKING_WRITE_TIMEOUT_MS);
      ....
? tlsState->write((char const*)(&data[numBytesSentSoFar]), 
numBytesRemainingToSend)
: send(socketNum, (char const*)(&data[numBytesSentSoFar]), 
numBytesRemainingToSend, 0/*flags*/);
      if ((unsigned)sendResult != numBytesRemainingToSend) {
      ....
removeStreamSocket(socketNum, 0xFF);
return False;
      }
      makeSocketNonBlocking(socketNum);
      return True;
    }
```

What happens next is when it attempts to read from a blocking socket in the 
loop of 2000 iterations below, it locks up forever (2000 x timeout is a long 
time):

liveMedia/RTPInterface.cpp:510:
```
void SocketDescriptor::tcpReadHandler(SocketDescriptor* socketDescriptor, int 
mask) {
  // Call the read handler until it returns false, with a limit to avoid 
starving other sockets
  unsigned count = 2000;
  socketDescriptor->fAreInReadHandlerLoop = True;
  while (!socketDescriptor->fDeleteMyselfNext && 
socketDescriptor->tcpReadHandler1(mask) && --count > 0) {}
  socketDescriptor->fAreInReadHandlerLoop = False;
  if (socketDescriptor->fDeleteMyselfNext) delete socketDescriptor;
}
```

I was able to mitigate it using the patch:

```
--- live/liveMedia/RTPInterface.cpp 2021-12-07 21:33:13.000000000 +0000
+++ live555/liveMedia/RTPInterface.cpp 2021-12-14 07:07:19.257748176 +0000
@@ -399,6 +399,7 @@
       sendResult = (tlsState != NULL && tlsState->isNeeded)
? tlsState->write((char const*)(&data[numBytesSentSoFar]), 
numBytesRemainingToSend)
: send(socketNum, (char const*)(&data[numBytesSentSoFar]), 
numBytesRemainingToSend, 0/*flags*/);
+      makeSocketNonBlocking(socketNum);
       if ((unsigned)sendResult != numBytesRemainingToSend) {
// The blocking "send()" failed, or timed out.  In either case, we assume that 
the
// TCP connection has failed (or is 'hanging' indefinitely), and we stop using 
it
@@ -411,7 +412,6 @@
removeStreamSocket(socketNum, 0xFF);
return False;
       }
-      makeSocketNonBlocking(socketNum);

       return True;
     } else if (sendResult < 0 && envir().getErrno() != EAGAIN) {
```

Any chance we can get this into the official build to avoid patching hell? 
Thanks

Andrei
_______________________________________________
live-devel mailing list
live-devel@lists.live555.com
http://lists.live555.com/mailman/listinfo/live-devel

Reply via email to