Re: [Lldb-commits] [PATCH] D42210: Re-enable logging on the child side of a fork() in lldb-server platform mode

Jason Molenda via lldb-commits Mon, 05 Feb 2018 11:39:20 -0800

Yeah I saw the TCPSocket::Accept differences on top of tree after sending the 
email - unfortunately I have to do my work on this old branch, and I can't 
build our top of tree sources for the iphone target right now. It's not a great 
situation!


The min/max ports are needed for running the testsuite on a device, we run a 
program to relay a specific list of ports between the mac and the phone so we 
have to depend on that.

I'll see if I can get the testsuite running on my local mac via lldb-server 
platform on top of tree and see what changes are needed there, hopefully I can 
repo the same problems I'm hitting on this old branch if they're still 
applicable.


> On Feb 5, 2018, at 2:56 AM, Pavel Labath <lab...@google.com> wrote:
> 
> Hi Jason,
> 
> Thanks for the heads up. I look forward to getting rid of fork()
> there, but there is one thing that's not clear to me. You say that
> TCPSocket::Accept() calls CloseListenSockets().. I see don't see
> anything like that in the current code, and I know for a fact that we
> are able to handle multiple (concurrent) connections, as that is how
> we run tests on android right now.  Could this be some problem with
> the branch that you were working on?
> 
> In general, if you don't need to use the --min-port/--max-port
> arguments (that is indeed broken in the fork mode), then everything
> should already be set-up for the parallel remote testing. In fact,
> unless you have a hard requirement to use --min/max-port then I would
> advise against it. I think it can cause you a lot of problems down the
> line. The issue with vending out ports like this is that the
> "PortCoordinator" does not have control over the whole system, and it
> cannot prevent some other process from claiming one of the ports that
> it "owns". So, you may need to implement some retry logic to handle
> the case when debugserver cannot bind to the port that it was assigned
> to.
> 
> I think that a much more reliable and simpler solution would be to
> have debugserver ask the OS to assign it an unused port (by binding to
> port zero), and then report that port back to lldb-platform (via the
> --named-pipe arg -- this is how we use llgs right now). Besides fixing
> the port allocation problem, this will also ensure that by the time
> the platform reports the port back to the client, debugserver is
> already primed and waiting for a connection. I'm not sure if
> debugserver right now supports the --named-pipe argument, but it
> should be fairly trivial to add it if needed.
> 
> Of course, if you need to the min/max-port mode of operation then
> that's fine, but if this were up to me, i'd try hard to find a way to
> avoid it. :)
> 
> regards,
> pavel
> 
> On 3 February 2018 at 04:12, Jason Molenda <jmole...@apple.com> wrote:
>> Hi Pavel, I closed this phabracator, I've been working on this on-and-off 
>> and I think a different approach would be better.
>> 
>> I am working up a patch where I change tools/lldb-server/lldb-platform.cpp 
>> from using a fork() model to using std::thread's.  The main thread listens 
>> for incoming connections, and when one is received, it starts up a new 
>> std::thread to handle that connection.  I have a PortCoordinator singleton 
>> object that manages the ports we can use for communications.  Newly created 
>> threads request ports from (& free them when the thread is finished) so that 
>> it would be possible to run multiple tests at the same time.
>> 
>> The current code fork()'s when it receives a connection, and gives the child 
>> process a copy of all the ports that are available.  Because it's in a 
>> separate process, if we were to receive a second connection and fork off a 
>> new process, it would have the same list of ports listed as available.  When 
>> debugserver is being used, this is a problem - when the lldb-server platform 
>> thread/process gets a qLaunchGDBServer packet, we run debugserver saying 
>> "hey debugserver listen for a connection on port N" and then tell the remote 
>> lldb process "there's a debugserver waiting to hear from you on port N" - so 
>> lldb-server can't test whether port N is in use or not.
>> 
>> (there was also a problem in GDBRemoteCommunication::StartDebugserverProcess 
>> where it has a url like localhost:1111 and then it tries to run debugserver 
>> in --named-pipe mode even though we already have a port # to use in the url.)
>> 
>> The final problem is that TCPSocket::Accept waits for an incoming connection 
>> on the assigned port #, and when it comes in it gets a new file descriptor 
>> for that connection.  It should resume listening to that assigned port for 
>> the next connection that comes in ... but today TCPSocket::Accept calls 
>> CloseListenSockets() so after opening the first lldb-server platform 
>> connection that comes in, when we go to accept() the second, the socket has 
>> been closed and we exit immediately.  To quickly recap the POSIX API (I 
>> never do network programming so I have to remind myself of this every time), 
>> the workflow for a server like this usually looks like
>> 
>> parentfd = socket(AF_INET, SOCK_STREAM, 0);
>> optval = 1;
>> setsockopt(parentfd, SOL_SOCKET, SO_REUSEADDR,
>>             (const void *)&optval , sizeof(int));
>> serveraddr.sin_family = AF_INET;
>> serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
>> serveraddr.sin_port = htons((unsigned short)portno);
>> bind(parentfd, (struct sockaddr *) &serveraddr, sizeof(serveraddr);
>> listen(parentfd, 100); // allow 100 connections to queue up -- whatever
>> childfd = accept(parentfd, (struct sockaddr *) &clientaddr, &clientlen);
>> 
>> and then we go back to accept()'ing on parentfd after we've spun off 
>> something to talk to childfd.
>> 
>> 
>> I've been doing all of my work on an old branch for reasons that are boring, 
>> so some/all of this may be fixed on top of tree already!  But this is what I 
>> had to do to get my branch to work correctly on a remote system.
>> 
>> I also noticed that dotest.py won't run multiple debug sessions with a 
>> remote target.  That was part of my goal, to speed these up, but it seems to 
>> run in --no-multiprocess mode currently.
>> 
>> 
>> I'll be picking this up next week - my changes right now are a mess, and 
>> they're against this old branch that I have to work on, but I'll get them 
>> cleaned up & updated to top of tree and make up a patchset.  But I wanted to 
>> give you a heads up on where I'm headed as this touches a lot of your code.
>> 
>> Thanks!
>> 
>> 
>> J
>> 
>> 
>> 
>>> On Jan 18, 2018, at 3:44 AM, Pavel Labath via Phabricator 
>>> <revi...@reviews.llvm.org> wrote:
>>> 
>>> labath added a comment.
>>> 
>>> In https://reviews.llvm.org/D42210#979608, @jasonmolenda wrote:
>>> 
>>>> Jim remembers some problem with logging across a fork()'ed process, Pavel 
>>>> does this ring a bell?  This change might be completely bogus -- but it's 
>>>> very difficult to debug the child side of an lldb-server platform today.
>>> 
>>> 
>>> I believe Jim is thinking of https://reviews.llvm.org/D38938. The issue is 
>>> that if another thread holds the logging mutex while we fork(), the mutex 
>>> will remain in a bogus state in the child process. This would mean that any 
>>> operation on the Log subsystem (such as enabling logging) could hang. We 
>>> hold the mutex for just a couple of instructions (just enough to copy a 
>>> shared_ptr), but after some code reshuffles, we were hitting this case very 
>>> often in liblldb.
>>> 
>>> Now, I don't think this can happen here as at the point where we are 
>>> forking, the platform process is single-threaded. So, enabling logging here 
>>> should be safe, but only accidentally. It should be possible to make this 
>>> always safe, but that will need to be done with a very steady hand. My 
>>> opinion is we should not bother with that (i.e., just check this in as-is) 
>>> until we actually need more threads in the platform, particularly as I 
>>> think the long-term direction here should be to replace the fork with a new 
>>> thread for handling the connection.
>>> 
>>> As for testing, the problem we have now is that we have no direct platform 
>>> tests today -- remote tests will test platform by the virtue of all 
>>> connections going through it but a standard check-lldb will not even run 
>>> this code. I have been planning to add tests like these, but I haven't got 
>>> around to that yet, and I'm not sure when will that happen. If you want to 
>>> take a stab at it, I can give you some pointers.
>>> 
>>> BTW: for most debugging needs you should be able to just start the platform 
>>> without the --server argument, which will disable forking and handle each 
>>> connection in the main process.
>>> 
>>> 
>>> Repository:
>>> rL LLVM
>>> 
>>> https://reviews.llvm.org/D42210
>>> 
>>> 
>>> 
>> 

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Re: [Lldb-commits] [PATCH] D42210: Re-enable logging on the child side of a fork() in lldb-server platform mode

Reply via email to