Hi:

Addition to my last message.  When uv__nonblock() fails it is indicative of 
a Linux FIONBIO ioctl() failure. What would cause
setting non-blocking mode to fail ?

Best Regards,

Paul R.

On Friday, May 7, 2021 at 9:11:00 PM UTC-7 [email protected] wrote:

> Hi:
>
> I was able to track down and fix the problem causing uv_poll_init() to 
> return EEXIST.
> However, uv_poll_init() still returns EBADF intermittently and I traced 
> the problem
> to the following code segment but don't fully understand why.
>
> The EBADF is generated by the following code segment in uv__poll_io() in 
> src/unix/poll.c around line 90.
>
>   /* If ioctl(FIONBIO) reports ENOTTY, try fcntl(F_GETFL) + fcntl(F_SETFL).
>    * Workaround for e.g. kqueue fds not supporting ioctls.
>    */
>   err = uv__nonblock(fd, 1);
>   if (err == UV_ENOTTY)
>     if (uv__nonblock == uv__nonblock_ioctl)
>       err = uv__nonblock_fcntl(fd, 1);
>
> Do you have any insight about this ?
>
> I was able to verify the the following sequence of events can occur and 
> strongly
> suspect it can lead to corruption. This is what causes EEXIST to be 
> returned in
> uv_poll_init() calls.
>
>
> 1) The following sequence of calls executes because it has been decided 
> polling needs
>    be terminated for a particular TCP connection.
>
>     uv_poll_stop(poll_handle_1);
>     uv_close((uv_handle_t *) poll_handle_1, close_callback);
>
> 2) Now it is decided polling needs to be terminated for another TCP 
> connection and
>    the following sequence of calls executes.
>
>     uv_poll_stop(poll_handle_2);
>     uv_close((uv_handle_t *) poll_handle_2, close_callback);
>
> 3) The close_callback() function executes but we don't know if it executes 
> for the
>    uv_close() call made with poll_handle_1 or poll_handle_2 first.
>
>
> Handling the problem, requires knowing that close_callback() won't execute 
> until
> uv_run() execute. (I don't think this is documented.)
>
> The solution is that when a uv_poll_stop()/uv_close() sequence is started 
> do
> not start another such sequence until both close_callback() and uv_run() 
> have
> executed.
>
> In my case, this requires a variable, Poll_Releasing which is set to TRUE 
> when such a
> sequence has been initiated and to FALSE when the close_callback() 
> executes.
> It also requires putting T_POLL_DISMANTLE_REQ messages that arrive while 
> Poll_Releasing
> is TRUE on a temporary queue. Then when the IO_Task_Input_Q is empty, the 
> messages from
> the temporary queue are moved to IO_Task_Input_Q and the poll_proxy() 
> routine returns which
> allows uv_run() to execute.
>
> The relevant code segments are as follows.
>
> 1) In the IO_Task() the poll_proxy() routine in network_io.c.
>
>
> ROUTINE PRIVATE void poll_proxy(uv_async_t *handle)
> {
>     CONN_DESC *cdesc;
>     MSG *msg, *rsp;
>     MSG_FIFO Poll_Release_Q;
>     uv_poll_t *poll_handle;
>     uv_handle_t **ppdata;
>     int r;
>     BOOL done;
>
>     //
>     // Do nothing if execution is already in progress.
>     //
>     if(Poll_Proxy_Busy)
>         return;
>
>     Poll_Proxy_Busy = TRUE;
>     Poll_Release_Q.head = Poll_Release_Q.tail = NULL;
>     //
>     // Handle messages from other tasks.
>     //
>     done = FALSE;
>     while(done != TRUE)
>       {
>
> ENTER_MUTEX(&Async_Mutex);
>         msg = remove_msg(&IO_Task_Input_Q);
> EXIT_MUTEX(&Async_Mutex);
>
>         if(msg)
>           {
>             cdesc = (CONN_DESC *) msg->conn_desc;
>             syslog(LOG_DEBUG, "IO_TASK(Proxy): RX EVENT - CONN[%d] Type 
> %d\n", cdesc->index, msg->type);
>
>             switch(msg->type)
>             {
>
>             case T_POLL_CONFIG_REQ:
>                 //
>                 // Main process request to start Libuv/epoll() operation.
>                 //
>                 ... Code to handle message ...
>                 break;
>             case T_POLL_DISMANTLE_REQ:
>                 //
>                 // If a poll handle is being dismantled, you must
>                 // wait for the uv_close() callback routine to execute
>                 // before handling this message. In order for the callback
>                 // routine to execute uv_run() must be called.
>                 //
>                 if(Poll_Releasing)
>                   {
>                     //
>                     // Save the message on a temporary queue
>                     // and move the contents of that queue to
>                     // the task input queue when this routine returns.
>                     //
>                     insert_msg(&Poll_Release_Q, msg);
>                     msg = NULL;
>                     break;
>                   }
>                 else
>                     Poll_Releasing = TRUE;
>                 //
>                 // Protocol_Task() request to cease Libuv/epoll() 
> operation and
>                 // release the poll handle and its resources.
>                 //
>
>                 ppdata = (uv_handle_t **) &msg->buffer[0];
>                 poll_handle = (uv_poll_t *) *ppdata;
>
>                 if( (r = uv_poll_stop(poll_handle)) )
>                   {
>                     fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Poll Stop 
> Error %d, %s",
>                          r, uv_err_name(r) );
>                   }
>                 //
>                 // The callback routine notifies the Protocol_Task() when 
> everything is done.
>                 //
>                 poll_handle->data = (void *) cdesc;
>                 uv_close((uv_handle_t *) poll_handle, 
> async_close_callback);
>                 break;
>             default:
>                 fatal_error(LOG_EMERG, "IO_TASK: POLL_PROXY - Invalid 
> Message = %d", msg->type);
>
>             }
>
>             if(msg)
>                 MSG_FREE(msg);
>           }
>         else
>           {
>             done = TRUE;
>             //
>             // The input queue is now empty. Move the messages saved
>             // on the temporary queue to it.
>             //
>             while( (msg = remove_msg(&Poll_Release_Q)) )
>                {
> ENTER_MUTEX(&Async_Mutex);
>                 insert_msg(&IO_Task_Input_Q, msg);
> EXIT_MUTEX(&Async_Mutex);
>                }
>           }
>       }
>
>     Poll_Proxy_Busy = FALSE;
>
>     return;
> }
>
>
> 2) In the IO_Task() the XXX routine in async.c.
>
> ROUTINE void async_close_callback(uv_handle_t *handle)
> {
>     MSG *msg;
>
>     CONN_DESC *cdesc = (CONN_DESC *) handle->data;
>     int type = handle->type;
>
>
>     VFREE((UCHAR *) handle);
>     //
>     // Send a notification message to the Protocol_Task.
>     //
>     msg = MSG_ALLOC(0, FALSE);
>     msg->class = C_NOTIFY;
>     msg->type = T_DISMANTLE_RSP;
>     msg->info = 0;
>
>     SendProtoMsg(cdesc, msg);
>     //
>     // Signal that the poll handle is released.
>     //
>     if(type == UV_POLL)
>       {
>         Poll_Releasing = FALSE;
>         syslog(LOG_INFO, "IO_Task: CONN[%d] Poll Release Complete", 
> cdesc->index);
>       }
>
>     return;
> }
>
>
>
>
>
> On 05/04/2021 06:21 PM, Paul Romero wrote:
>
> Hi:
>
> I have two questions about Libuv that I haven't seen addressed by the 
> documentation.
>
> First, when uv_poll_init() returns EEXIST, could it be indicative of a 
> transient
> problem rather than an operational problem or bug ?
>
> Second, this question is related to concurrency but does not involve 
> concurrent execution.
> In fact the all the Libuv calls execute serially.
>
> 1) The following sequence of calls executes because it has been decided 
> polling needs
>    be terminated for a particular TCP connection.
>
>     uv_poll_stop(poll_handle_1);
>     uv_close((uv_handle_t *) poll_handle_1, close_callback);
>
> 2) Now it is decided polling needs to be terminated for another TCP 
> connection and
>    the following sequence of calls executes.
>
>     uv_poll_stop(poll_handle_2);
>     uv_close((uv_handle_t *) poll_handle_2, close_callback);
>
> 3) The close_callback() function executes but we don't know if it executes 
> for the
>    uv_close() call made with poll_handle_1 or poll_handle_2 first.
>
> Can this happen and result in corruption ?
>
> Best Regards,
>
> Paul R.
>
> On 05/02/2021 06:51 AM, Jameson Nash wrote:
>
> You might want to try compiling your application with Clang TSan. 
> Although, since libuv is a single-threaded library (per uv_loop_t), any 
> access from multiple threads is generally just forbidden.
>
> On Sun, May 2, 2021 at 2:24 AM Paul Romero <[email protected]> 
> wrote:
>
>> Hi:
>>
>> A trace giving a more refined picture of the problem and more details is 
>> attached
>> to this note. It dumps what I consider to be the significant fields of 
>> the uv_loop_t structure,
>> the Poll_Loop in my code, and the the dynamically allocated uv_poll_t 
>> structure surrounding
>> the essential Libuv API routine calls. I think the format of the dump is 
>> fairly intuitive
>> but please inform me if you disagree.
>>
>> My comments about what is happening are identified by the string 
>> "COMMENT:" and the
>> API calls are identified by the string "MPOLL". The trace is redacted to 
>> eliminate
>> irrelevant events.
>>
>> Please let me know if you see something amiss about the data structure 
>> fields at
>> any point.
>>
>> My server code is based on simple event oriented architecture and uses 
>> the following 5 concurrent
>> tasks.
>>
>> 1) main(): Handles incoming connections and Linux signals. (Libuv based.)
>> 2) IO_Task(): Handles incoming data from the network. (Libuv based.)
>> 3) Protocol_Task(): Does most of the work. (Message based.)
>> 4) DB_Task(): Does database insertions. (Message based.)
>> 5) Timer_Task(): Handles timer management. (Mostly  Libuv based.)
>>
>> I can send you my server code and a test tool if it would be useful to 
>> you.
>> However, it may take a few days to put it in an easily useable form.
>>
>> As previously mentioned uv_accept() executes in the main() task and the
>> rest of the polling related Libuv API routines execute in the IO_Task().
>>
>> Finally, remember the problem is intermittent. Generally, it takes about
>> 45 to 3 hours of constant concurrent connections for the crash to occur.
>>
>> Best Regards,
>>
>> Paul R.
>>
>>
>>
>>
>> On 05/01/2021 12:01 AM, Paul Romero wrote:
>>
>> Hi:
>>
>> I may have overlooked the following factor.
>>
>> We know that Libuv is not reentrant and not really multi-task safe. I 
>> suspect, but
>> don't know, that the internal Libuv data structures attached to the 
>> uv_loop_t structure,
>> Poll_Loop in my case, are not protected by internal Libuv Mutexes.
>>
>> 1) uv_accept() and uv_fileno() exectute in the main() thread.
>>
>> 2) uv_poll_init(), uv_poll_start() and uv_poll_stop(), execute in the 
>> IO_Task() thread.
>>
>> That means uv_accept() and that the Libuv API routines in 2, above, can 
>> execute concurrently.
>> Is it possible if such concurrency could damage the internal Libuv data 
>> structures attached to the
>> uv_loop_t structure ?  (Again Poll_Loop in my case.)
>>
>> Best Regards,
>>
>> Paul R.
>>
>> NOTE: I did a few experments and noticed that calling uv_close() in a 
>> Mutex often
>> prevents its callback routine from executing.
>>
>> On 04/30/2021 04:59 PM, Paul Romero wrote:
>>
>>
>> Hi:
>>
>> I think I've isolated the problem and it appears to be in uv_poll_stop().
>>
>> In particlular if uv_accept() is called immediately after uv_poll_stop(), 
>> and it reallocates the
>> same socket descriptor just freed by uv_poll_stop(), then uv_poll_init() 
>> fails with the error
>> EEXIST or EBADF. This must have something to do with uv_loop_t handle 
>> passed to uv_poll_init(),
>> called Poll_Loop in my code.
>>
>> I had to instrument my code to keep track of Libuv API calls to see this.
>>
>> The following explanatory items follows.
>>
>> A - The sequence of calls that produces the problem.
>> B - Commented syslog() diagnostics.
>> C - The relevant code segments.
>> D - The gdb stack backtrace.
>>
>> Best Regards,
>>
>> Paul R.
>>
>>
>> A) The sequence of relevant calls is as follows and the problem only 
>> occur with this exact
>> sequence of events.
>>
>> ****************************************************************************************
>>
>> 1) uv_poll_stop(): The uv_pool_t handle is bound to socket descriptor N.
>>    This should free socket descriptor N.
>> 2) uv_accept(): Bound to socket desciptor N as set by uv_fileno().
>>    Socket descriptor N should be newly allocated.
>> 3) uv_poll_init(): Returns EBADF with socket descriptor N and 
>> Poll_Loop.watchers[N] is NULL.
>>    Something is wrong !
>>
>> B) The commented syslog() diagnostic sequence from the instrumented code.
>> **************************************************************************
>> * Stop Polling - Connection 1 Fd 21. This frees the FD.
>>
>> Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 1 fd 21] - IO_TASK - 
>> uv_poll_stop() - T_POLL_DISMANTLE_REQ
>>
>> * Stop Polling - Connection 0 Fd 20. This frees the FD
>>
>> Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 0 fd 20] - IO_TASK - 
>> uv_poll_stop() - T_POLL_DISMANTLE_REQ
>>
>> * New Connection - Connection 1 Fd 20. This is fine since uv_poll_stop() 
>> was called for Fd 20 which
>>   freed it and it is newly allcated with uv_accept() and obtained with 
>> uv_fileno().
>>
>> Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 1 fd 20] - IO_TASK - 
>> uv_accept() + uv_fileno()
>>
>> * However we crash because uv_poll_init() returned EBADF !
>>
>> Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 1 fd 20] - IO_TASK - 
>> uv_poll_init() - T_POLL_CONFIG_REQ
>> Apr 30 15:46:08 mobius pexd[30963]: IO_TASK: POLL_PROXY -  Polling 
>> Initialization Error -9, EBADF
>>
>> C) The edited associated code segments are as follows.
>> ********************************************************
>> 1) Prelude to uv_poll_stop() in poll_proxy() in the IO_Task() thread. 
>> (network_io.c)
>>
>> ------------------------------------------------------------------------------------
>>             case T_POLL_DISMANTLE_REQ:
>>                 ppdata = (uv_handle_t **) &msg->buffer[0];
>>                 poll_handle = (uv_poll_t *) *ppdata;
>>
>> syslog(LOG_INFO, "POLL_CALL[i %d fd %d] - IO_TASK - uv_poll_stop() - 
>> T_POLL_DISMANTLE_REQ\n", cdesc->index, cdesc->fd);
>>
>>                 if( (r = uv_poll_stop(poll_handle)) )
>>                   {
>>                     fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Poll 
>> Stop Error %d, %s",
>>                          r, uv_err_name(r) );
>>                   }
>>                 //
>>                 // The callback routine notifies the Protocol_Task() when 
>> everything is done.
>>                 //
>>                 poll_handle->data = (void *) cdesc;
>>                 uv_close((uv_handle_t *) poll_handle, 
>> async_close_callback);
>>
>>
>> 2) Connection acceptance in make_incoming_connection() in the main() 
>> thread. (main.c)
>>
>> ---------------------------------------------------------------------------------------
>>         conn_handle = (uv_tcp_t *) VALLOC(sizeof(uv_tcp_t));
>>     uv_tcp_init(&Connect_Loop, conn_handle);
>>
>>     if((r = uv_accept(listen_handle, (uv_stream_t *) conn_handle)) == 0)
>>        {
>>         //
>>         // A new connection occured.
>>         //
>>             uv_os_fd_t fd;
>> ENTER_MUTEX(&Protocol_D_Mutex);
>>         nsock = N_Sockets++;
>> EXIT_MUTEX(&Protocol_D_Mutex);
>>         //
>>         // Fetch the socket descriptor from the connection handle.
>>         //
>>         uv_fileno((const uv_handle_t*) conn_handle, &fd);
>>         //
>>         // Allocate the connection descriptor.
>>         //
>>         cdesc = ALLOC_CONN_DESC(fd);
>>         if( !cdesc )
>>           {
>>             syslog(LOG_DEBUG, "MAIN_TASK: CONNECTION REJECTION - No 
>> Connection Descriptors, N =  %d", nsock);
>>
>> ENTER_MUTEX(&Protocol_D_Mutex);
>>             N_Sockets--;
>>             close(fd);
>> EXIT_MUTEX(&Protocol_D_Mutex);
>>             uv_close((uv_handle_t *) conn_handle, forced_close_callback);
>>             return;
>>           }
>>
>> syslog(LOG_INFO, "POLL_CALL[i %d fd %d] - IO_TASK - uv_accept() + 
>> uv_fileno()\n", cdesc->index, cdesc->fd);
>>         //
>>         // Save the connection handle and start polling.
>>         //
>>         cdesc->conn_handle = (uv_tcp_t *) conn_handle;
>>
>>         syslog(LOG_INFO, "MAIN_TASK: NEW CONNECTION ESTABLISHED - CONN %d 
>> FD %d", cdesc->index, cdesc->fd);
>>         //
>>         // Set up epoll() plumbing by sending a message to IO_Task();
>>         //
>>         bzero((void *) &poll_data, sizeof(ASYNC_DATA));
>>         poll_data.type = T_POLL_CONFIG_REQ;
>>         poll_data.async_handle = &IO_Task_Async_Handle;
>>         poll_data.async_q = &IO_Task_Input_Q;
>>         poll_data.object_handle = NULL;
>>         //
>>         // The T_POLL_CONFIG_RSP message will be sent to the 
>> Protocol_Task() which
>>         // is in S_IDLE state.
>>         //
>>         send_async_msg(cdesc, &poll_data);
>>       }
>>
>> 3) Prelude to uv_poll_init() in poll_proxy() in the IO_Task() thread. 
>> (network_io.c)
>>
>> ------------------------------------------------------------------------------------
>>             case T_POLL_CONFIG_REQ:
>>                 //
>>                 // Main process request to start Libuv/epoll() operation.
>>                 //
>>                 poll_handle = (uv_poll_t *) VALLOC(sizeof(uv_poll_t));
>>
>> syslog(LOG_INFO, "POLL_CALL[i %d fd %d] - IO_TASK - uv_poll_init() - 
>> T_POLL_CONFIG_REQ\n", cdesc->index, cdesc->fd);
>>                 if( (r = uv_poll_init(&Poll_Loop, poll_handle, 
>> cdesc->fd)) != 0)
>>                   {
>>
>>                     fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY -  Polling 
>> Initialization Error %d, %s",
>>                          r, uv_err_name(r));
>>                   }
>>
>>
>>
>> D) Stack backtrace.
>>
>> (gdb) bt
>> #0  0x00007f62c47b0267 in __GI_raise (sig=sig@entry=6) at 
>> ../sysdeps/unix/sysv/linux/raise.c:55
>> #1  0x00007f62c47b1eca in __GI_abort () at abort.c:89
>> #2  0x000000000040d411 in fatal_error (log_prio=2, 
>>     fmt=0x424e28 "IO_TASK: POLL_PROXY -  Polling Initialization Error %d, 
>> %s") at misc.c:135
>> #3  0x0000000000405b6d in poll_proxy (handle=0x641f80 
>> <IO_Task_Async_Handle>) at network_io.c:291
>> #4  0x00000000004114cb in uv__async_io (loop=0xc26800 <Poll_Loop>, 
>> w=<optimized out>, events=<optimized out>)
>>     at src/unix/async.c:163
>> #5  0x000000000041d255 in uv__io_poll (loop=loop@entry=0xc26800 
>> <Poll_Loop>, timeout=<optimized out>)
>>     at src/unix/linux-core.c:462
>> #6  0x0000000000411c48 in uv_run (loop=0xc26800 <Poll_Loop>, 
>> mode=UV_RUN_DEFAULT) at src/unix/core.c:385
>> #7  0x0000000000405835 in IO_Task (arg=0x0) at network_io.c:100
>> #8  0x00007f62c4b4c6aa in start_thread (arg=0x7f62bbfff700) at 
>> pthread_create.c:333
>> #9  0x00007f62c4881eed in clone () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>
>> At this point cdesc->fd is 20 and Poll_Loop.watchers[20] is NULL.
>> On 04/30/2021 11:48 AM, Paul Romero wrote:
>>
>> Hi:
>>
>> My guess is that there is a subtle, probably timing related, problem in 
>> uv_poll_stop()
>> simply because I can't think of any other relevant Libuv routines that 
>> are explicitly
>> called after uv_poll_start() succeeds.  What do you think ? Of course, 
>> there could
>> be a problem in the way connections are rejected. All that code is in the 
>> make_incoming_connection()
>> callback routine in main.c
>>
>> Best Regards,
>>
>> Paul R.
>>
>>
>> On 04/29/2021 03:14 PM, Paul Romero wrote:
>>
>> Hi:
>>
>> I reattached the following files with some confusing debugging stuff 
>> removed and
>> a dump using them follows.
>>
>> * main.c
>> * network_io.c
>> * rx_fsm.c
>> * async.c
>>
>> Again this is the relevant code segment.
>>
>>     case T_POLL_CONFIG_REQ:
>>         //
>>         // Main process request to start Libuv/epoll() operation.
>>         //
>>         poll_handle = (uv_poll_t *) VALLOC(sizeof(uv_poll_t));
>>
>>         if( (r = uv_poll_init(&Poll_Loop, poll_handle, cdesc->fd)) != 0)
>>           {
>>
>>             fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY -  Polling 
>> Initialization Error %d, %s",
>>                  r, uv_err_name(r));
>>           }
>>
>> This is the crash diagnostic.
>>
>> IO_TASK: POLL_PROXY -  Polling Initialization Error -17, EEXIST
>> Aborted (core dumped)
>>
>> And the is the gdb backtrace and is typical of the problem.
>>
>> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> Core was generated by `./pexd'.
>> Program terminated with signal SIGABRT, Aborted.
>> #0  0x00007ff1da90b267 in __GI_raise (sig=sig@entry=6) at 
>> ../sysdeps/unix/sysv/linux/raise.c:55
>> 55    ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> (gdb) bt
>> #0  0x00007ff1da90b267 in __GI_raise (sig=sig@entry=6) at 
>> ../sysdeps/unix/sysv/linux/raise.c:55
>> #1  0x00007ff1da90ceca in __GI_abort () at abort.c:89
>> #2  0x000000000040d379 in fatal_error (log_prio=2, 
>>     fmt=0x424ce0 "IO_TASK: POLL_PROXY -  Polling Initialization Error %d, 
>> %s") at misc.c:135
>> #3  0x0000000000405b21 in poll_proxy (handle=0x641f80 
>> <IO_Task_Async_Handle>) at network_io.c:287
>> #4  0x000000000041142b in uv__async_io (loop=0xc26800 <Poll_Loop>, 
>> w=<optimized out>, events=<optimized out>)
>>     at src/unix/async.c:163
>> #5  0x000000000041d1b5 in uv__io_poll (loop=loop@entry=0xc26800 
>> <Poll_Loop>, timeout=<optimized out>)
>>     at src/unix/linux-core.c:462
>> #6  0x0000000000411ba8 in uv_run (loop=0xc26800 <Poll_Loop>, 
>> mode=UV_RUN_DEFAULT) at src/unix/core.c:385
>> #7  0x000000000040580f in IO_Task (arg=0x0) at network_io.c:100
>> #8  0x00007ff1daca76aa in start_thread (arg=0x7ff1d5a7c700) at 
>> pthread_create.c:333
>> #9  0x00007ff1da9dceed in clone () at 
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>
>> At this point cdesc->fd is 22 and Poll_Loop.watchers[22] = 0x17ef458, a 
>> valid address ?
>>
>> On 04/29/2021 09:09 AM, Paul Romero wrote:
>>
>> Hi:
>>
>> The most common symptom of the problem is that uv_poll_init() fails which 
>> means the problem must
>> be related to the  uv_poll_t Poll_Loop. The relevant code segment is as 
>> follows.
>>
>>    poll_handle = (uv_poll_t *) VALLOC(sizeof(uv_poll_t));
>>
>>    if( (r = uv_poll_init(&Poll_Loop, poll_handle, cdesc->fd)) != 0)
>>      {
>>           fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY -  Polling 
>> Initialization Error %d, %s",
>>                              r, uv_err_name(r));
>>       }
>>
>> The following files illustrate the problem and are attached to this 
>> message. Please advise me
>> how to attach files to the list directly.
>>
>> *  trace_badf.txt: A stack trace when the error code is EBADF.
>> *  trace_eexist.txt: A stack trace when the error code is EEXIST.
>> * network_io.c: The IO_Task() thread code. The problem occurs in the 
>> poll_proxy() routine.
>> * main.c:  The main() process/thread code where most initialization 
>> occurs.
>> * transmit.c: The Transmit_Task() thread.
>> * rx_fsm.c: The heart of the protocol task code.
>>
>>
>> You are right that only the IO_Task() performs polling related operations.
>> However, please note that the Transmit_Task() thread uses the underlying 
>> TCP socket descriptor
>> to send data to the network with Linux send(). (See the TX_DATA() 
>> routine.) Also,
>> USE_SSL and USE_SCRAM are not defined.
>>
>> I will send more information if am able to  produce the problem at 
>> another point.
>> Also, I have seen the issue you mentioned before and suspect it may be 
>> relevant.
>> Please let me know if you need anything else.
>>
>>
>> Best Regards,
>>
>> Paul R.
>>
>>
>>
>>
>>
>>
>> Best Regards,
>>
>> Paul R.
>>
>>
>> On 04/28/2021 08:52 PM, Jameson Nash wrote:
>>
>> I'm assuming from your description that you've ensured all libuv calls 
>> happen from exactly one thread (except uv_async_send). It sounds like 
>> you're describing the issue that  
>> https://github.com/libuv/libuv/commit/c9406ba0e3d67907c1973a71968b89a6bd83c63c
>>  
>> was intended to fix, which was included in v1.41.0
>>
>> Note that the Poll_Loop.watchers[N] is not always required to be NULL, so 
>> perhaps you've found a case where it is still incorrectly expecting that? 
>> You might want to file an issue with code to reproduce the problem, or at 
>> least an `rr` trace dump of the code that hits problem (note that `rr` 
>> could potentially upload all of your hard drive, so be careful what you are 
>> submitting).
>>
>> On Wed, Apr 28, 2021 at 11:13 PM [email protected] <
>> [email protected]> wrote:
>>
>>> Hi Folks:
>>>
>>> I am experiencing an intermittent problem with uv_poll_init() after the 
>>> successful establishment and release
>>> of multiple concurrent TCP connections. I am not sure if this problem is 
>>> due to a bug, which may
>>> be corrected in another Libuv release, or if I am doing something wrong 
>>> when releasing the poll handle.
>>> Do you have insight about the cause ? The details I consider most 
>>> important follow.
>>>
>>> The uv_poll_t callback function reads incoming TCP data as follows.
>>>
>>>     iff(events & UV_READABLE)
>>>       {
>>>         CONN_DESC *cdesc = (CONN_DESC *)  poll_handle->data;
>>>         n = recv(cdesc->fd, cdesc->rx_buf, RX_BUF_SIZE, MSG_DONTWAIT);
>>>
>>> NOTE: I am using Libuv version 1.41.0 running on Ubuntu Linux version 
>>> 15.04.
>>>
>>> The problem is that uv_poll_init() fails, normally with the EEXIST or 
>>> EBADF error code, and
>>> my investigation indicates the uv_loop_t Poll_Loop.watchers[N] field is 
>>> not NULL when it should be,
>>> where N is TCP socket descriptor. It occurs immediately after the 
>>> uv_poll_t poll_handle is allocated.
>>> (There is exactly one TCP socket descriptor per poll handle.) The call 
>>> to uv_poll_init() is as follows
>>> and the socket descriptor is obtained with uv_fileno().
>>>
>>>     if( (r = uv_poll_init(&Poll_Loop, poll_handle, cdesc->fd)) != 0)
>>>        {
>>>         fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY -  Polling 
>>> Initialization Error %d, %s",
>>>              r, uv_err_name(r));
>>>        }
>>>
>>> It occurs in the IO_Task() thread when there multiple TCP socket 
>>> descriptors are in use. The IO_task
>>> releases the poll_handle with the following code sequence when it is 
>>> notified that polling should stop
>>> via a Libuv async. message from the Protocol_Task() thread.
>>>
>>>     if( (r = uv_poll_stop(poll_handle)) )
>>>       {
>>>         fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Poll Stop Error %d, 
>>> %s",
>>>              r, uv_err_name(r) );
>>>       }
>>>
>>>     poll_handle->data = (void *) cdesc;
>>>     uv_close((uv_handle_t *) poll_handle, async_close_callback);
>>>
>>> The actual release occurs in async_close_callback() as follows and the 
>>> Protocol_Task() releases the
>>> TCP socket decriptor with a normal Linux close() after it receives the 
>>> T_DISMANTLE response message.
>>>     
>>>     VFREE((UCHAR *) handle);
>>>     //
>>>     // Send a notification message to the Protocol_Task.
>>>     //
>>>     msg = MSG_ALLOC(0, FALSE);
>>>     msg->class = C_NOTIFY;
>>>     msg->type = T_DISMANTLE_RSP;
>>>     msg->info = 0;
>>>
>>>     SendProtoMsg(cdesc, msg);
>>>
>>> I think the underlying cause is that if there is new TCP connection that 
>>> uses a the same TCP socket descriptor
>>> as one released with uv_poll_stop() and uv_close(), the call to 
>>> uv_poll_init() occurs before the socket closure
>>> has propogated into the uv_loop_t Poll_Handle.
>>>
>>> Best Regards,
>>>
>>> Paul Romero
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "libuv" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/libuv/c3290a76-ab6b-42ad-8540-33021c6188b9n%40googlegroups.com
>>> .
>>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "libuv" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/libuv/_4ClQoaVPCg/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/libuv/CADnnjUXSsF7QwRnVqFojAm1W_o35CHKyMCPM%2BVw0FM0FjqN1XQ%40mail.gmail.com
>> .
>>
>>
>> -- 
>>
>>
>> Paul Romero
>> -----------
>> RCOM Communications Software
>> EMAIL: [email protected]
>> PHONE: (510)482-2769 <(510)%20482-2769>
>>
>>
>>
>>  
>> -- 
>>
>>
>> Paul Romero
>> -----------
>> RCOM Communications Software
>> EMAIL: [email protected]
>> PHONE: (510)482-2769 <(510)%20482-2769>
>>
>>
>>
>>  
>> -- 
>>
>>
>> Paul Romero
>> -----------
>> RCOM Communications Software
>> EMAIL: [email protected]
>> PHONE: (510)482-2769 <(510)%20482-2769>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "libuv" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/libuv/_4ClQoaVPCg/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/libuv/608C50E8.7070202%40rcom-software.com
>> .
>>
>>
>> -- 
>>
>>
>> Paul Romero
>> -----------
>> RCOM Communications Software
>> EMAIL: [email protected]
>> PHONE: (510)482-2769 <(510)%20482-2769>
>>
>>
>>
>>  
>> -- 
>>
>>
>> Paul Romero
>> -----------
>> RCOM Communications Software
>> EMAIL: [email protected]
>> PHONE: (510)482-2769 <(510)%20482-2769>
>>
>>
>>
>>  
>> -- 
>>
>>
>> Paul Romero
>> -----------
>> RCOM Communications Software
>> EMAIL: [email protected]
>> PHONE: (510)482-2769 <(510)%20482-2769>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "libuv" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/libuv/608E45B4.4050606%40rcom-software.com
>> .
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "libuv" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/libuv/CADnnjUW%2BMivauJHJm5kib0BxG8TCGSWSp%2BTcbi6CQ6nCrsrgPg%40mail.gmail.com
> .
>
>
> -- 
>
>
> Paul Romero
> -----------
> RCOM Communications Software
> EMAIL: [email protected]
> PHONE: (510)482-2769 <(510)%20482-2769>
>
>
>
>  
> -- 
>
>
> Paul Romero
> -----------
> RCOM Communications Software
> EMAIL: [email protected]
> PHONE: (510)482-2769 <(510)%20482-2769>
>
>
>
>  

-- 
You received this message because you are subscribed to the Google Groups 
"libuv" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/libuv/0c21cbbd-f06f-4758-a013-4e6492040014n%40googlegroups.com.

Reply via email to