uschindler commented on PR #12535:
URL: https://github.com/apache/lucene/pull/12535#issuecomment-1704953411

   Hi,
   I checked the code again:
   
   I think the timeout was added to make sure the server ends if one of the 
clients does not start at all or fails to start. The server is written fine, it 
waits for exactly `n` clients to start. If one of the client will crush before 
connecting the server will hang forever, so there should really be a timeout! 
If 30 seconds is too short, I think we have to dig into the code and figure out 
why the client did not start. Maybe there's startup logging missing or we do 
not print it out.
   
   So in my opinion, we should better figure out why the client did not start. 
Maybe becuase something ran out of file handles?
   
   So my ideas:
   - +1 to remove the SO_REUSEADDR
   - -1 to remove timeout completely as it will cause the test server to hang 
if any of the clients does not successfully start
   
   We should maybe change the Lucene Unit Test  that uses the server class to 
better log the stdout/stderr of clients and have some more sanity checks. The 
30 seconds timeout should be fine. If we figure out that in reality the clients 
are too slow to start, we can raise the timeout. But actually it is very 
unlikely that the very first client takes so long. As the server just accepts 
`n` clients it waits 30 seconds *per* client. Once the firts client started it 
is also unlikly that later ones take aditional 30 seconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to