On Mon, 30 Apr 2018 at 20:13, Scott Funkenhauser <sfunkenhau...@google.com> wrote:
> I messed up my test setup and incorrectly attributed the additional time. Parsing the DWARF sections does add additional time, but only increases by a small percentage. > By pure chance I noticed that the latency between the client and server had a huge impact. I did some tests against lldb_server running on a machine with a RTT of 18ms to the client. The difference in load time in my real world example (initializing Vulkan) was around 16x slower (~3.5s vs 55s). I did some more digging and it looks like there are certain operations that perform a number of synchronous requests to the server (DYLDRendezvous::TakeSnapshot - updating SO entries and ThreadList::WillResume - grabbing SP and FP for every thread). Since all the requests are synchronous they are extremely sensitive to increased latency. > Is this a limitation of the gdb-server (can't handle parallel requests)? > Or is this not a common use case, and is not a known issue? This is a known issue, though I did not expect it to have that much of an impact. In fact, I have trouble reconciling this fact with your earlier statement that second and subsequent runs are much faster. The SO entry loading is something that has to happen on every run, so I don't see why the second run would be faster. This would be more consistent with the debug-info parsing case, as there we only index the dwarf once (if it hasn't changed). So, I think we are missing something here. In any case, this is not a fundamental limitation, and there are ways to remove that. The most obvious one is to move the rendezvous structure parsing to the server -- there is even a gdb packet for that, i don't know its name off-hand. Currently we have support for that in the client (for communicating with stubs that support it), but not in lldb-server. For the register reading part, we usually make sure we send the "important" registers in a batch, so that the client does not have to handle every one separately. At the moment it's only PC, because that used to be enough at some point. I don't know if anything changed in the client to make it ask for more info, but this is something that needs to be looked at more closely. > I enabled "gdb-remote all" logging, and searched for all all instances of '0x00005652ACF3F120' (the address of the connection object that is reporting the timeout?). Seems to be a pretty good corelation between the timeouts and 'Communication::SyncronizeWithReadThread', unfortunately I haven't had time to investigate further. > 1525110345.916504145 0x5652acefcbb0 Broadcaster("lldb.process")::RestoreBroadcaster (about to pop listener("lldb.PlatformLinux.DebugProcess.hijack")=0x5652acf079a0) > 1525110345.916502953 this = 0x00005652ACF3F120, timeout = 5000000 us > -- > 1525110345.919557333 0x7f0868000940 'Communication::SyncronizeWithReadThread' Listener::FindNextEventInternal(broadcaster=(nil), broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event 0x7f086c0008c0 > 525110345.919566154 this = 0x00005652ACF3F120, timeout = 5000000 us > -- > 1525110346.123922110 0x7f0868000d10 'Communication::SyncronizeWithReadThread' Listener::FindNextEventInternal(broadcaster=(nil), broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event 0x7f086c0008c0 > 1525110346.123931408 this = 0x00005652ACF3F120, timeout = 5000000 us > -- > 1525110346.152676821 0x7f0868006710 'Communication::SyncronizeWithReadThread' Listener::FindNextEventInternal(broadcaster=(nil), broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event 0x7f086c0008c0 > 1525110346.152685642 this = 0x00005652ACF3F120, timeout = 5000000 us > -- > 1525110346.167683363 0x7f08682b2fe0 'Communication::SyncronizeWithReadThread' Listener::FindNextEventInternal(broadcaster=(nil), broadcaster_names=(nil)[0], event_type_mask=0x00000000, remove=1) event 0x7f086c0008c0 > 1525110346.167692184 this = 0x00005652ACF3F120, timeout = 5000000 us I think the timeout quest is a red haring, tbh. These two are correlated, because they are both things that we need to do when the process stops (flush its STDIO after it stops, and start the read thread after we resume it). The run fast, and there are no timeouts involved. > -- > 1525110351.172777176 error: timed out, status = timed out, uri = > 1525110351.172847271 this = 0x00005652ACF3F120, timeout = 5000000 us > 1525110356.173308611 error: timed out, status = timed out, uri = > 1525110356.173368216 this = 0x00005652ACF3F120, timeout = 5000000 us > 1525110361.175591230 error: timed out, status = timed out, uri = > 1525110361.175647497 this = 0x00005652ACF3F120, timeout = 5000000 us > 1525110366.180710316 error: timed out, status = timed out, uri = > 1525110366.180769205 this = 0x00005652ACF3F120, timeout = 5000000 us And I bet these happen while the process is running, so they do not impact the latency at all. It's just us waiting for the process to stop in a loop. _______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev