Hi, I've ran into a bug in libavformat's libsrt while using OBS. It's still present as of commit ed51890191. I've tracked down the issue and implemented a workaround, but a possible misunderstanding of SRT's asynchronous functionality which caused this issue seems to be permeating the implementation.
First, the initial issue: I have a Media Source (this uses libavcodec) in an OBS [0] scene which takes an SRT input. This is configured with the 'close file when inactive' option and with a 2s reconnect delay. The expectation is that if the stream stops / is disconnected (more specifically if it receives an SRT UMSG_SHUTDOWN packet which terminates the connection), this source will be hidden. In practice, I verified that the UMSG_SHUTDOWN packet was always delivered immediately, however most times the source would end up displaying the last frame for around a minute, then I'd get a message along the lines of 'remove_usock: @193058704 not found as either socket or group. Removing only from epoll system' in the terminal and then Media Source would finally close. Rarely, it worked as expected. Then I found that if I modified the SRT server feeding this Media Source to stop sending data, wait for a couple of seconds, and the close the connection, then it almost always worked as expected - but not quite every time. Long story short, I've tracked the issue to a race condition in libavformat/libsrt.c. The way SRT's async / epoll-based IO works is that the event status is stored in the epoll containers. That is, if an event occurs on an SRT socket, and that SRT socket isn't part of any epoll container, then that event is lost. If we later add that socket to an epoll container, we still won't receive the event even if it wasn't serviced. The documentation references this [1]. So what's happening in libavformat/libsrt.c is that libsrt_read() calls into libsrt_network_wait_fd_timeout() -> libsrt_network_wait_fd(), which adds the socket to an epoll container and waits for data, an error or a POLLING_TIME timeout. Then when libsrt_network_wait_fd() returns, it *removes* the socket from the epoll container. However, for the OBS use case to work correctly, the relevant event which must be captured by libsrt_network_wait_fd() is the SRT_EPOLL_ERR, which would cause AVERROR(EIO) to be returned. The event is only delivered once when the connection is closed. While data arrives and is serviced by the rest of the code, this event was typically lost. But if I waited for a few seconds after I stopped sending data before closing the connection, then it was typically captured as this thread was idle, which explains the behavior I've observed initially. This does raise the question of how come libsrt_read() doesn't lose data since it only reads a single packet per SRT_EPOLL_IN event. My guess is that some internal SRT logic repeatedly raises the SRT_EPOLL_IN event as long as there's data available in the read buffer. Not sure if this is deliberate or a coincidence, as it contradicts the documentation. I also wonder whether this means there are race conditions in libsrt_listen() and libsrt_listen_connect(), as they start asynchronous calls before the socket is added to the epoll container. My quick and dirty fix was to remove the call to srt_epoll_remove_usock() from libsrt_network_wait_fd(), so the socket is always in the epoll container with SRT_EPOLL_ET enabled. Then it works as expected. Internally in SRT, calls to srt_epoll_add_usock() and srt_epoll_update_usock() both end up calling s_UDTUnited.epoll_add_usock(), so for now the srt_epoll_add_usock() in libsrt_network_wait_fd() can both add the socket to the epoll container on the first call, and then update the flags on subsequent calls. But obviously this doesn't use the API correctly according to the documentation. Thanks and sorry about the wall of text [0] https://github.com/obsproject/obs-studio [1] https://github.com/Haivision/srt/blob/master/docs/API-functions.md#srt_epoll_update_usock _______________________________________________ ffmpeg-devel mailing list [email protected] https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email [email protected] with subject "unsubscribe".
