Hi: Addition to my last message. When uv__nonblock() fails it is indicative of a Linux FIONBIO ioctl() failure. What would cause setting non-blocking mode to fail ?
Best Regards, Paul R. On Friday, May 7, 2021 at 9:11:00 PM UTC-7 [email protected] wrote: > Hi: > > I was able to track down and fix the problem causing uv_poll_init() to > return EEXIST. > However, uv_poll_init() still returns EBADF intermittently and I traced > the problem > to the following code segment but don't fully understand why. > > The EBADF is generated by the following code segment in uv__poll_io() in > src/unix/poll.c around line 90. > > /* If ioctl(FIONBIO) reports ENOTTY, try fcntl(F_GETFL) + fcntl(F_SETFL). > * Workaround for e.g. kqueue fds not supporting ioctls. > */ > err = uv__nonblock(fd, 1); > if (err == UV_ENOTTY) > if (uv__nonblock == uv__nonblock_ioctl) > err = uv__nonblock_fcntl(fd, 1); > > Do you have any insight about this ? > > I was able to verify the the following sequence of events can occur and > strongly > suspect it can lead to corruption. This is what causes EEXIST to be > returned in > uv_poll_init() calls. > > > 1) The following sequence of calls executes because it has been decided > polling needs > be terminated for a particular TCP connection. > > uv_poll_stop(poll_handle_1); > uv_close((uv_handle_t *) poll_handle_1, close_callback); > > 2) Now it is decided polling needs to be terminated for another TCP > connection and > the following sequence of calls executes. > > uv_poll_stop(poll_handle_2); > uv_close((uv_handle_t *) poll_handle_2, close_callback); > > 3) The close_callback() function executes but we don't know if it executes > for the > uv_close() call made with poll_handle_1 or poll_handle_2 first. > > > Handling the problem, requires knowing that close_callback() won't execute > until > uv_run() execute. (I don't think this is documented.) > > The solution is that when a uv_poll_stop()/uv_close() sequence is started > do > not start another such sequence until both close_callback() and uv_run() > have > executed. > > In my case, this requires a variable, Poll_Releasing which is set to TRUE > when such a > sequence has been initiated and to FALSE when the close_callback() > executes. > It also requires putting T_POLL_DISMANTLE_REQ messages that arrive while > Poll_Releasing > is TRUE on a temporary queue. Then when the IO_Task_Input_Q is empty, the > messages from > the temporary queue are moved to IO_Task_Input_Q and the poll_proxy() > routine returns which > allows uv_run() to execute. > > The relevant code segments are as follows. > > 1) In the IO_Task() the poll_proxy() routine in network_io.c. > > > ROUTINE PRIVATE void poll_proxy(uv_async_t *handle) > { > CONN_DESC *cdesc; > MSG *msg, *rsp; > MSG_FIFO Poll_Release_Q; > uv_poll_t *poll_handle; > uv_handle_t **ppdata; > int r; > BOOL done; > > // > // Do nothing if execution is already in progress. > // > if(Poll_Proxy_Busy) > return; > > Poll_Proxy_Busy = TRUE; > Poll_Release_Q.head = Poll_Release_Q.tail = NULL; > // > // Handle messages from other tasks. > // > done = FALSE; > while(done != TRUE) > { > > ENTER_MUTEX(&Async_Mutex); > msg = remove_msg(&IO_Task_Input_Q); > EXIT_MUTEX(&Async_Mutex); > > if(msg) > { > cdesc = (CONN_DESC *) msg->conn_desc; > syslog(LOG_DEBUG, "IO_TASK(Proxy): RX EVENT - CONN[%d] Type > %d\n", cdesc->index, msg->type); > > switch(msg->type) > { > > case T_POLL_CONFIG_REQ: > // > // Main process request to start Libuv/epoll() operation. > // > ... Code to handle message ... > break; > case T_POLL_DISMANTLE_REQ: > // > // If a poll handle is being dismantled, you must > // wait for the uv_close() callback routine to execute > // before handling this message. In order for the callback > // routine to execute uv_run() must be called. > // > if(Poll_Releasing) > { > // > // Save the message on a temporary queue > // and move the contents of that queue to > // the task input queue when this routine returns. > // > insert_msg(&Poll_Release_Q, msg); > msg = NULL; > break; > } > else > Poll_Releasing = TRUE; > // > // Protocol_Task() request to cease Libuv/epoll() > operation and > // release the poll handle and its resources. > // > > ppdata = (uv_handle_t **) &msg->buffer[0]; > poll_handle = (uv_poll_t *) *ppdata; > > if( (r = uv_poll_stop(poll_handle)) ) > { > fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Poll Stop > Error %d, %s", > r, uv_err_name(r) ); > } > // > // The callback routine notifies the Protocol_Task() when > everything is done. > // > poll_handle->data = (void *) cdesc; > uv_close((uv_handle_t *) poll_handle, > async_close_callback); > break; > default: > fatal_error(LOG_EMERG, "IO_TASK: POLL_PROXY - Invalid > Message = %d", msg->type); > > } > > if(msg) > MSG_FREE(msg); > } > else > { > done = TRUE; > // > // The input queue is now empty. Move the messages saved > // on the temporary queue to it. > // > while( (msg = remove_msg(&Poll_Release_Q)) ) > { > ENTER_MUTEX(&Async_Mutex); > insert_msg(&IO_Task_Input_Q, msg); > EXIT_MUTEX(&Async_Mutex); > } > } > } > > Poll_Proxy_Busy = FALSE; > > return; > } > > > 2) In the IO_Task() the XXX routine in async.c. > > ROUTINE void async_close_callback(uv_handle_t *handle) > { > MSG *msg; > > CONN_DESC *cdesc = (CONN_DESC *) handle->data; > int type = handle->type; > > > VFREE((UCHAR *) handle); > // > // Send a notification message to the Protocol_Task. > // > msg = MSG_ALLOC(0, FALSE); > msg->class = C_NOTIFY; > msg->type = T_DISMANTLE_RSP; > msg->info = 0; > > SendProtoMsg(cdesc, msg); > // > // Signal that the poll handle is released. > // > if(type == UV_POLL) > { > Poll_Releasing = FALSE; > syslog(LOG_INFO, "IO_Task: CONN[%d] Poll Release Complete", > cdesc->index); > } > > return; > } > > > > > > On 05/04/2021 06:21 PM, Paul Romero wrote: > > Hi: > > I have two questions about Libuv that I haven't seen addressed by the > documentation. > > First, when uv_poll_init() returns EEXIST, could it be indicative of a > transient > problem rather than an operational problem or bug ? > > Second, this question is related to concurrency but does not involve > concurrent execution. > In fact the all the Libuv calls execute serially. > > 1) The following sequence of calls executes because it has been decided > polling needs > be terminated for a particular TCP connection. > > uv_poll_stop(poll_handle_1); > uv_close((uv_handle_t *) poll_handle_1, close_callback); > > 2) Now it is decided polling needs to be terminated for another TCP > connection and > the following sequence of calls executes. > > uv_poll_stop(poll_handle_2); > uv_close((uv_handle_t *) poll_handle_2, close_callback); > > 3) The close_callback() function executes but we don't know if it executes > for the > uv_close() call made with poll_handle_1 or poll_handle_2 first. > > Can this happen and result in corruption ? > > Best Regards, > > Paul R. > > On 05/02/2021 06:51 AM, Jameson Nash wrote: > > You might want to try compiling your application with Clang TSan. > Although, since libuv is a single-threaded library (per uv_loop_t), any > access from multiple threads is generally just forbidden. > > On Sun, May 2, 2021 at 2:24 AM Paul Romero <[email protected]> > wrote: > >> Hi: >> >> A trace giving a more refined picture of the problem and more details is >> attached >> to this note. It dumps what I consider to be the significant fields of >> the uv_loop_t structure, >> the Poll_Loop in my code, and the the dynamically allocated uv_poll_t >> structure surrounding >> the essential Libuv API routine calls. I think the format of the dump is >> fairly intuitive >> but please inform me if you disagree. >> >> My comments about what is happening are identified by the string >> "COMMENT:" and the >> API calls are identified by the string "MPOLL". The trace is redacted to >> eliminate >> irrelevant events. >> >> Please let me know if you see something amiss about the data structure >> fields at >> any point. >> >> My server code is based on simple event oriented architecture and uses >> the following 5 concurrent >> tasks. >> >> 1) main(): Handles incoming connections and Linux signals. (Libuv based.) >> 2) IO_Task(): Handles incoming data from the network. (Libuv based.) >> 3) Protocol_Task(): Does most of the work. (Message based.) >> 4) DB_Task(): Does database insertions. (Message based.) >> 5) Timer_Task(): Handles timer management. (Mostly Libuv based.) >> >> I can send you my server code and a test tool if it would be useful to >> you. >> However, it may take a few days to put it in an easily useable form. >> >> As previously mentioned uv_accept() executes in the main() task and the >> rest of the polling related Libuv API routines execute in the IO_Task(). >> >> Finally, remember the problem is intermittent. Generally, it takes about >> 45 to 3 hours of constant concurrent connections for the crash to occur. >> >> Best Regards, >> >> Paul R. >> >> >> >> >> On 05/01/2021 12:01 AM, Paul Romero wrote: >> >> Hi: >> >> I may have overlooked the following factor. >> >> We know that Libuv is not reentrant and not really multi-task safe. I >> suspect, but >> don't know, that the internal Libuv data structures attached to the >> uv_loop_t structure, >> Poll_Loop in my case, are not protected by internal Libuv Mutexes. >> >> 1) uv_accept() and uv_fileno() exectute in the main() thread. >> >> 2) uv_poll_init(), uv_poll_start() and uv_poll_stop(), execute in the >> IO_Task() thread. >> >> That means uv_accept() and that the Libuv API routines in 2, above, can >> execute concurrently. >> Is it possible if such concurrency could damage the internal Libuv data >> structures attached to the >> uv_loop_t structure ? (Again Poll_Loop in my case.) >> >> Best Regards, >> >> Paul R. >> >> NOTE: I did a few experments and noticed that calling uv_close() in a >> Mutex often >> prevents its callback routine from executing. >> >> On 04/30/2021 04:59 PM, Paul Romero wrote: >> >> >> Hi: >> >> I think I've isolated the problem and it appears to be in uv_poll_stop(). >> >> In particlular if uv_accept() is called immediately after uv_poll_stop(), >> and it reallocates the >> same socket descriptor just freed by uv_poll_stop(), then uv_poll_init() >> fails with the error >> EEXIST or EBADF. This must have something to do with uv_loop_t handle >> passed to uv_poll_init(), >> called Poll_Loop in my code. >> >> I had to instrument my code to keep track of Libuv API calls to see this. >> >> The following explanatory items follows. >> >> A - The sequence of calls that produces the problem. >> B - Commented syslog() diagnostics. >> C - The relevant code segments. >> D - The gdb stack backtrace. >> >> Best Regards, >> >> Paul R. >> >> >> A) The sequence of relevant calls is as follows and the problem only >> occur with this exact >> sequence of events. >> >> **************************************************************************************** >> >> 1) uv_poll_stop(): The uv_pool_t handle is bound to socket descriptor N. >> This should free socket descriptor N. >> 2) uv_accept(): Bound to socket desciptor N as set by uv_fileno(). >> Socket descriptor N should be newly allocated. >> 3) uv_poll_init(): Returns EBADF with socket descriptor N and >> Poll_Loop.watchers[N] is NULL. >> Something is wrong ! >> >> B) The commented syslog() diagnostic sequence from the instrumented code. >> ************************************************************************** >> * Stop Polling - Connection 1 Fd 21. This frees the FD. >> >> Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 1 fd 21] - IO_TASK - >> uv_poll_stop() - T_POLL_DISMANTLE_REQ >> >> * Stop Polling - Connection 0 Fd 20. This frees the FD >> >> Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 0 fd 20] - IO_TASK - >> uv_poll_stop() - T_POLL_DISMANTLE_REQ >> >> * New Connection - Connection 1 Fd 20. This is fine since uv_poll_stop() >> was called for Fd 20 which >> freed it and it is newly allcated with uv_accept() and obtained with >> uv_fileno(). >> >> Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 1 fd 20] - IO_TASK - >> uv_accept() + uv_fileno() >> >> * However we crash because uv_poll_init() returned EBADF ! >> >> Apr 30 15:46:08 mobius pexd[30963]: POLL_CALL[i 1 fd 20] - IO_TASK - >> uv_poll_init() - T_POLL_CONFIG_REQ >> Apr 30 15:46:08 mobius pexd[30963]: IO_TASK: POLL_PROXY - Polling >> Initialization Error -9, EBADF >> >> C) The edited associated code segments are as follows. >> ******************************************************** >> 1) Prelude to uv_poll_stop() in poll_proxy() in the IO_Task() thread. >> (network_io.c) >> >> ------------------------------------------------------------------------------------ >> case T_POLL_DISMANTLE_REQ: >> ppdata = (uv_handle_t **) &msg->buffer[0]; >> poll_handle = (uv_poll_t *) *ppdata; >> >> syslog(LOG_INFO, "POLL_CALL[i %d fd %d] - IO_TASK - uv_poll_stop() - >> T_POLL_DISMANTLE_REQ\n", cdesc->index, cdesc->fd); >> >> if( (r = uv_poll_stop(poll_handle)) ) >> { >> fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Poll >> Stop Error %d, %s", >> r, uv_err_name(r) ); >> } >> // >> // The callback routine notifies the Protocol_Task() when >> everything is done. >> // >> poll_handle->data = (void *) cdesc; >> uv_close((uv_handle_t *) poll_handle, >> async_close_callback); >> >> >> 2) Connection acceptance in make_incoming_connection() in the main() >> thread. (main.c) >> >> --------------------------------------------------------------------------------------- >> conn_handle = (uv_tcp_t *) VALLOC(sizeof(uv_tcp_t)); >> uv_tcp_init(&Connect_Loop, conn_handle); >> >> if((r = uv_accept(listen_handle, (uv_stream_t *) conn_handle)) == 0) >> { >> // >> // A new connection occured. >> // >> uv_os_fd_t fd; >> ENTER_MUTEX(&Protocol_D_Mutex); >> nsock = N_Sockets++; >> EXIT_MUTEX(&Protocol_D_Mutex); >> // >> // Fetch the socket descriptor from the connection handle. >> // >> uv_fileno((const uv_handle_t*) conn_handle, &fd); >> // >> // Allocate the connection descriptor. >> // >> cdesc = ALLOC_CONN_DESC(fd); >> if( !cdesc ) >> { >> syslog(LOG_DEBUG, "MAIN_TASK: CONNECTION REJECTION - No >> Connection Descriptors, N = %d", nsock); >> >> ENTER_MUTEX(&Protocol_D_Mutex); >> N_Sockets--; >> close(fd); >> EXIT_MUTEX(&Protocol_D_Mutex); >> uv_close((uv_handle_t *) conn_handle, forced_close_callback); >> return; >> } >> >> syslog(LOG_INFO, "POLL_CALL[i %d fd %d] - IO_TASK - uv_accept() + >> uv_fileno()\n", cdesc->index, cdesc->fd); >> // >> // Save the connection handle and start polling. >> // >> cdesc->conn_handle = (uv_tcp_t *) conn_handle; >> >> syslog(LOG_INFO, "MAIN_TASK: NEW CONNECTION ESTABLISHED - CONN %d >> FD %d", cdesc->index, cdesc->fd); >> // >> // Set up epoll() plumbing by sending a message to IO_Task(); >> // >> bzero((void *) &poll_data, sizeof(ASYNC_DATA)); >> poll_data.type = T_POLL_CONFIG_REQ; >> poll_data.async_handle = &IO_Task_Async_Handle; >> poll_data.async_q = &IO_Task_Input_Q; >> poll_data.object_handle = NULL; >> // >> // The T_POLL_CONFIG_RSP message will be sent to the >> Protocol_Task() which >> // is in S_IDLE state. >> // >> send_async_msg(cdesc, &poll_data); >> } >> >> 3) Prelude to uv_poll_init() in poll_proxy() in the IO_Task() thread. >> (network_io.c) >> >> ------------------------------------------------------------------------------------ >> case T_POLL_CONFIG_REQ: >> // >> // Main process request to start Libuv/epoll() operation. >> // >> poll_handle = (uv_poll_t *) VALLOC(sizeof(uv_poll_t)); >> >> syslog(LOG_INFO, "POLL_CALL[i %d fd %d] - IO_TASK - uv_poll_init() - >> T_POLL_CONFIG_REQ\n", cdesc->index, cdesc->fd); >> if( (r = uv_poll_init(&Poll_Loop, poll_handle, >> cdesc->fd)) != 0) >> { >> >> fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Polling >> Initialization Error %d, %s", >> r, uv_err_name(r)); >> } >> >> >> >> D) Stack backtrace. >> >> (gdb) bt >> #0 0x00007f62c47b0267 in __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:55 >> #1 0x00007f62c47b1eca in __GI_abort () at abort.c:89 >> #2 0x000000000040d411 in fatal_error (log_prio=2, >> fmt=0x424e28 "IO_TASK: POLL_PROXY - Polling Initialization Error %d, >> %s") at misc.c:135 >> #3 0x0000000000405b6d in poll_proxy (handle=0x641f80 >> <IO_Task_Async_Handle>) at network_io.c:291 >> #4 0x00000000004114cb in uv__async_io (loop=0xc26800 <Poll_Loop>, >> w=<optimized out>, events=<optimized out>) >> at src/unix/async.c:163 >> #5 0x000000000041d255 in uv__io_poll (loop=loop@entry=0xc26800 >> <Poll_Loop>, timeout=<optimized out>) >> at src/unix/linux-core.c:462 >> #6 0x0000000000411c48 in uv_run (loop=0xc26800 <Poll_Loop>, >> mode=UV_RUN_DEFAULT) at src/unix/core.c:385 >> #7 0x0000000000405835 in IO_Task (arg=0x0) at network_io.c:100 >> #8 0x00007f62c4b4c6aa in start_thread (arg=0x7f62bbfff700) at >> pthread_create.c:333 >> #9 0x00007f62c4881eed in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> >> At this point cdesc->fd is 20 and Poll_Loop.watchers[20] is NULL. >> On 04/30/2021 11:48 AM, Paul Romero wrote: >> >> Hi: >> >> My guess is that there is a subtle, probably timing related, problem in >> uv_poll_stop() >> simply because I can't think of any other relevant Libuv routines that >> are explicitly >> called after uv_poll_start() succeeds. What do you think ? Of course, >> there could >> be a problem in the way connections are rejected. All that code is in the >> make_incoming_connection() >> callback routine in main.c >> >> Best Regards, >> >> Paul R. >> >> >> On 04/29/2021 03:14 PM, Paul Romero wrote: >> >> Hi: >> >> I reattached the following files with some confusing debugging stuff >> removed and >> a dump using them follows. >> >> * main.c >> * network_io.c >> * rx_fsm.c >> * async.c >> >> Again this is the relevant code segment. >> >> case T_POLL_CONFIG_REQ: >> // >> // Main process request to start Libuv/epoll() operation. >> // >> poll_handle = (uv_poll_t *) VALLOC(sizeof(uv_poll_t)); >> >> if( (r = uv_poll_init(&Poll_Loop, poll_handle, cdesc->fd)) != 0) >> { >> >> fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Polling >> Initialization Error %d, %s", >> r, uv_err_name(r)); >> } >> >> This is the crash diagnostic. >> >> IO_TASK: POLL_PROXY - Polling Initialization Error -17, EEXIST >> Aborted (core dumped) >> >> And the is the gdb backtrace and is typical of the problem. >> >> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". >> Core was generated by `./pexd'. >> Program terminated with signal SIGABRT, Aborted. >> #0 0x00007ff1da90b267 in __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:55 >> 55 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> (gdb) bt >> #0 0x00007ff1da90b267 in __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:55 >> #1 0x00007ff1da90ceca in __GI_abort () at abort.c:89 >> #2 0x000000000040d379 in fatal_error (log_prio=2, >> fmt=0x424ce0 "IO_TASK: POLL_PROXY - Polling Initialization Error %d, >> %s") at misc.c:135 >> #3 0x0000000000405b21 in poll_proxy (handle=0x641f80 >> <IO_Task_Async_Handle>) at network_io.c:287 >> #4 0x000000000041142b in uv__async_io (loop=0xc26800 <Poll_Loop>, >> w=<optimized out>, events=<optimized out>) >> at src/unix/async.c:163 >> #5 0x000000000041d1b5 in uv__io_poll (loop=loop@entry=0xc26800 >> <Poll_Loop>, timeout=<optimized out>) >> at src/unix/linux-core.c:462 >> #6 0x0000000000411ba8 in uv_run (loop=0xc26800 <Poll_Loop>, >> mode=UV_RUN_DEFAULT) at src/unix/core.c:385 >> #7 0x000000000040580f in IO_Task (arg=0x0) at network_io.c:100 >> #8 0x00007ff1daca76aa in start_thread (arg=0x7ff1d5a7c700) at >> pthread_create.c:333 >> #9 0x00007ff1da9dceed in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> >> At this point cdesc->fd is 22 and Poll_Loop.watchers[22] = 0x17ef458, a >> valid address ? >> >> On 04/29/2021 09:09 AM, Paul Romero wrote: >> >> Hi: >> >> The most common symptom of the problem is that uv_poll_init() fails which >> means the problem must >> be related to the uv_poll_t Poll_Loop. The relevant code segment is as >> follows. >> >> poll_handle = (uv_poll_t *) VALLOC(sizeof(uv_poll_t)); >> >> if( (r = uv_poll_init(&Poll_Loop, poll_handle, cdesc->fd)) != 0) >> { >> fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Polling >> Initialization Error %d, %s", >> r, uv_err_name(r)); >> } >> >> The following files illustrate the problem and are attached to this >> message. Please advise me >> how to attach files to the list directly. >> >> * trace_badf.txt: A stack trace when the error code is EBADF. >> * trace_eexist.txt: A stack trace when the error code is EEXIST. >> * network_io.c: The IO_Task() thread code. The problem occurs in the >> poll_proxy() routine. >> * main.c: The main() process/thread code where most initialization >> occurs. >> * transmit.c: The Transmit_Task() thread. >> * rx_fsm.c: The heart of the protocol task code. >> >> >> You are right that only the IO_Task() performs polling related operations. >> However, please note that the Transmit_Task() thread uses the underlying >> TCP socket descriptor >> to send data to the network with Linux send(). (See the TX_DATA() >> routine.) Also, >> USE_SSL and USE_SCRAM are not defined. >> >> I will send more information if am able to produce the problem at >> another point. >> Also, I have seen the issue you mentioned before and suspect it may be >> relevant. >> Please let me know if you need anything else. >> >> >> Best Regards, >> >> Paul R. >> >> >> >> >> >> >> Best Regards, >> >> Paul R. >> >> >> On 04/28/2021 08:52 PM, Jameson Nash wrote: >> >> I'm assuming from your description that you've ensured all libuv calls >> happen from exactly one thread (except uv_async_send). It sounds like >> you're describing the issue that >> https://github.com/libuv/libuv/commit/c9406ba0e3d67907c1973a71968b89a6bd83c63c >> >> was intended to fix, which was included in v1.41.0 >> >> Note that the Poll_Loop.watchers[N] is not always required to be NULL, so >> perhaps you've found a case where it is still incorrectly expecting that? >> You might want to file an issue with code to reproduce the problem, or at >> least an `rr` trace dump of the code that hits problem (note that `rr` >> could potentially upload all of your hard drive, so be careful what you are >> submitting). >> >> On Wed, Apr 28, 2021 at 11:13 PM [email protected] < >> [email protected]> wrote: >> >>> Hi Folks: >>> >>> I am experiencing an intermittent problem with uv_poll_init() after the >>> successful establishment and release >>> of multiple concurrent TCP connections. I am not sure if this problem is >>> due to a bug, which may >>> be corrected in another Libuv release, or if I am doing something wrong >>> when releasing the poll handle. >>> Do you have insight about the cause ? The details I consider most >>> important follow. >>> >>> The uv_poll_t callback function reads incoming TCP data as follows. >>> >>> iff(events & UV_READABLE) >>> { >>> CONN_DESC *cdesc = (CONN_DESC *) poll_handle->data; >>> n = recv(cdesc->fd, cdesc->rx_buf, RX_BUF_SIZE, MSG_DONTWAIT); >>> >>> NOTE: I am using Libuv version 1.41.0 running on Ubuntu Linux version >>> 15.04. >>> >>> The problem is that uv_poll_init() fails, normally with the EEXIST or >>> EBADF error code, and >>> my investigation indicates the uv_loop_t Poll_Loop.watchers[N] field is >>> not NULL when it should be, >>> where N is TCP socket descriptor. It occurs immediately after the >>> uv_poll_t poll_handle is allocated. >>> (There is exactly one TCP socket descriptor per poll handle.) The call >>> to uv_poll_init() is as follows >>> and the socket descriptor is obtained with uv_fileno(). >>> >>> if( (r = uv_poll_init(&Poll_Loop, poll_handle, cdesc->fd)) != 0) >>> { >>> fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Polling >>> Initialization Error %d, %s", >>> r, uv_err_name(r)); >>> } >>> >>> It occurs in the IO_Task() thread when there multiple TCP socket >>> descriptors are in use. The IO_task >>> releases the poll_handle with the following code sequence when it is >>> notified that polling should stop >>> via a Libuv async. message from the Protocol_Task() thread. >>> >>> if( (r = uv_poll_stop(poll_handle)) ) >>> { >>> fatal_error(LOG_CRIT, "IO_TASK: POLL_PROXY - Poll Stop Error %d, >>> %s", >>> r, uv_err_name(r) ); >>> } >>> >>> poll_handle->data = (void *) cdesc; >>> uv_close((uv_handle_t *) poll_handle, async_close_callback); >>> >>> The actual release occurs in async_close_callback() as follows and the >>> Protocol_Task() releases the >>> TCP socket decriptor with a normal Linux close() after it receives the >>> T_DISMANTLE response message. >>> >>> VFREE((UCHAR *) handle); >>> // >>> // Send a notification message to the Protocol_Task. >>> // >>> msg = MSG_ALLOC(0, FALSE); >>> msg->class = C_NOTIFY; >>> msg->type = T_DISMANTLE_RSP; >>> msg->info = 0; >>> >>> SendProtoMsg(cdesc, msg); >>> >>> I think the underlying cause is that if there is new TCP connection that >>> uses a the same TCP socket descriptor >>> as one released with uv_poll_stop() and uv_close(), the call to >>> uv_poll_init() occurs before the socket closure >>> has propogated into the uv_loop_t Poll_Handle. >>> >>> Best Regards, >>> >>> Paul Romero >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "libuv" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/libuv/c3290a76-ab6b-42ad-8540-33021c6188b9n%40googlegroups.com >>> . >>> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "libuv" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/libuv/_4ClQoaVPCg/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/libuv/CADnnjUXSsF7QwRnVqFojAm1W_o35CHKyMCPM%2BVw0FM0FjqN1XQ%40mail.gmail.com >> . >> >> >> -- >> >> >> Paul Romero >> ----------- >> RCOM Communications Software >> EMAIL: [email protected] >> PHONE: (510)482-2769 <(510)%20482-2769> >> >> >> >> >> -- >> >> >> Paul Romero >> ----------- >> RCOM Communications Software >> EMAIL: [email protected] >> PHONE: (510)482-2769 <(510)%20482-2769> >> >> >> >> >> -- >> >> >> Paul Romero >> ----------- >> RCOM Communications Software >> EMAIL: [email protected] >> PHONE: (510)482-2769 <(510)%20482-2769> >> >> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "libuv" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/libuv/_4ClQoaVPCg/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/libuv/608C50E8.7070202%40rcom-software.com >> . >> >> >> -- >> >> >> Paul Romero >> ----------- >> RCOM Communications Software >> EMAIL: [email protected] >> PHONE: (510)482-2769 <(510)%20482-2769> >> >> >> >> >> -- >> >> >> Paul Romero >> ----------- >> RCOM Communications Software >> EMAIL: [email protected] >> PHONE: (510)482-2769 <(510)%20482-2769> >> >> >> >> >> -- >> >> >> Paul Romero >> ----------- >> RCOM Communications Software >> EMAIL: [email protected] >> PHONE: (510)482-2769 <(510)%20482-2769> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "libuv" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/libuv/608E45B4.4050606%40rcom-software.com >> . >> > -- > You received this message because you are subscribed to the Google Groups > "libuv" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/libuv/CADnnjUW%2BMivauJHJm5kib0BxG8TCGSWSp%2BTcbi6CQ6nCrsrgPg%40mail.gmail.com > . > > > -- > > > Paul Romero > ----------- > RCOM Communications Software > EMAIL: [email protected] > PHONE: (510)482-2769 <(510)%20482-2769> > > > > > -- > > > Paul Romero > ----------- > RCOM Communications Software > EMAIL: [email protected] > PHONE: (510)482-2769 <(510)%20482-2769> > > > > -- You received this message because you are subscribed to the Google Groups "libuv" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/libuv/0c21cbbd-f06f-4758-a013-4e6492040014n%40googlegroups.com.
