Hi Jason, Thanks for the heads up. I look forward to getting rid of fork() there, but there is one thing that's not clear to me. You say that TCPSocket::Accept() calls CloseListenSockets().. I see don't see anything like that in the current code, and I know for a fact that we are able to handle multiple (concurrent) connections, as that is how we run tests on android right now. Could this be some problem with the branch that you were working on?
In general, if you don't need to use the --min-port/--max-port arguments (that is indeed broken in the fork mode), then everything should already be set-up for the parallel remote testing. In fact, unless you have a hard requirement to use --min/max-port then I would advise against it. I think it can cause you a lot of problems down the line. The issue with vending out ports like this is that the "PortCoordinator" does not have control over the whole system, and it cannot prevent some other process from claiming one of the ports that it "owns". So, you may need to implement some retry logic to handle the case when debugserver cannot bind to the port that it was assigned to. I think that a much more reliable and simpler solution would be to have debugserver ask the OS to assign it an unused port (by binding to port zero), and then report that port back to lldb-platform (via the --named-pipe arg -- this is how we use llgs right now). Besides fixing the port allocation problem, this will also ensure that by the time the platform reports the port back to the client, debugserver is already primed and waiting for a connection. I'm not sure if debugserver right now supports the --named-pipe argument, but it should be fairly trivial to add it if needed. Of course, if you need to the min/max-port mode of operation then that's fine, but if this were up to me, i'd try hard to find a way to avoid it. :) regards, pavel On 3 February 2018 at 04:12, Jason Molenda <jmole...@apple.com> wrote: > Hi Pavel, I closed this phabracator, I've been working on this on-and-off and > I think a different approach would be better. > > I am working up a patch where I change tools/lldb-server/lldb-platform.cpp > from using a fork() model to using std::thread's. The main thread listens > for incoming connections, and when one is received, it starts up a new > std::thread to handle that connection. I have a PortCoordinator singleton > object that manages the ports we can use for communications. Newly created > threads request ports from (& free them when the thread is finished) so that > it would be possible to run multiple tests at the same time. > > The current code fork()'s when it receives a connection, and gives the child > process a copy of all the ports that are available. Because it's in a > separate process, if we were to receive a second connection and fork off a > new process, it would have the same list of ports listed as available. When > debugserver is being used, this is a problem - when the lldb-server platform > thread/process gets a qLaunchGDBServer packet, we run debugserver saying "hey > debugserver listen for a connection on port N" and then tell the remote lldb > process "there's a debugserver waiting to hear from you on port N" - so > lldb-server can't test whether port N is in use or not. > > (there was also a problem in GDBRemoteCommunication::StartDebugserverProcess > where it has a url like localhost:1111 and then it tries to run debugserver > in --named-pipe mode even though we already have a port # to use in the url.) > > The final problem is that TCPSocket::Accept waits for an incoming connection > on the assigned port #, and when it comes in it gets a new file descriptor > for that connection. It should resume listening to that assigned port for > the next connection that comes in ... but today TCPSocket::Accept calls > CloseListenSockets() so after opening the first lldb-server platform > connection that comes in, when we go to accept() the second, the socket has > been closed and we exit immediately. To quickly recap the POSIX API (I never > do network programming so I have to remind myself of this every time), the > workflow for a server like this usually looks like > > parentfd = socket(AF_INET, SOCK_STREAM, 0); > optval = 1; > setsockopt(parentfd, SOL_SOCKET, SO_REUSEADDR, > (const void *)&optval , sizeof(int)); > serveraddr.sin_family = AF_INET; > serveraddr.sin_addr.s_addr = htonl(INADDR_ANY); > serveraddr.sin_port = htons((unsigned short)portno); > bind(parentfd, (struct sockaddr *) &serveraddr, sizeof(serveraddr); > listen(parentfd, 100); // allow 100 connections to queue up -- whatever > childfd = accept(parentfd, (struct sockaddr *) &clientaddr, &clientlen); > > and then we go back to accept()'ing on parentfd after we've spun off > something to talk to childfd. > > > I've been doing all of my work on an old branch for reasons that are boring, > so some/all of this may be fixed on top of tree already! But this is what I > had to do to get my branch to work correctly on a remote system. > > I also noticed that dotest.py won't run multiple debug sessions with a remote > target. That was part of my goal, to speed these up, but it seems to run in > --no-multiprocess mode currently. > > > I'll be picking this up next week - my changes right now are a mess, and > they're against this old branch that I have to work on, but I'll get them > cleaned up & updated to top of tree and make up a patchset. But I wanted to > give you a heads up on where I'm headed as this touches a lot of your code. > > Thanks! > > > J > > > >> On Jan 18, 2018, at 3:44 AM, Pavel Labath via Phabricator >> <revi...@reviews.llvm.org> wrote: >> >> labath added a comment. >> >> In https://reviews.llvm.org/D42210#979608, @jasonmolenda wrote: >> >>> Jim remembers some problem with logging across a fork()'ed process, Pavel >>> does this ring a bell? This change might be completely bogus -- but it's >>> very difficult to debug the child side of an lldb-server platform today. >> >> >> I believe Jim is thinking of https://reviews.llvm.org/D38938. The issue is >> that if another thread holds the logging mutex while we fork(), the mutex >> will remain in a bogus state in the child process. This would mean that any >> operation on the Log subsystem (such as enabling logging) could hang. We >> hold the mutex for just a couple of instructions (just enough to copy a >> shared_ptr), but after some code reshuffles, we were hitting this case very >> often in liblldb. >> >> Now, I don't think this can happen here as at the point where we are >> forking, the platform process is single-threaded. So, enabling logging here >> should be safe, but only accidentally. It should be possible to make this >> always safe, but that will need to be done with a very steady hand. My >> opinion is we should not bother with that (i.e., just check this in as-is) >> until we actually need more threads in the platform, particularly as I think >> the long-term direction here should be to replace the fork with a new thread >> for handling the connection. >> >> As for testing, the problem we have now is that we have no direct platform >> tests today -- remote tests will test platform by the virtue of all >> connections going through it but a standard check-lldb will not even run >> this code. I have been planning to add tests like these, but I haven't got >> around to that yet, and I'm not sure when will that happen. If you want to >> take a stab at it, I can give you some pointers. >> >> BTW: for most debugging needs you should be able to just start the platform >> without the --server argument, which will disable forking and handle each >> connection in the main process. >> >> >> Repository: >> rL LLVM >> >> https://reviews.llvm.org/D42210 >> >> >> > _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits