[lldb-dev] Unreliable process attach on Linux
Consider this example program: #include #include #include #include #include #include int main(void) { // Target process for the debugger. pid_t pid = fork(); if (pid < 0) err(1, "fork"); if (pid == 0) while (true) pause(); lldb::SBDebugger::Initialize(); { auto debugger(lldb::SBDebugger::Create()); if (!debugger.IsValid()) errx(1, "SBDebugger::Create failed"); auto target(debugger.CreateTarget(nullptr)); if (!target.IsValid()) errx(1, "SBDebugger::CreateTarget failed"); lldb::SBAttachInfo attachinfo(pid); lldb::SBError error; auto process(target.Attach(attachinfo, error)); if (!process.IsValid()) errx(1, "SBTarget::Attach failed: %s", error.GetCString()); error = process.Detach(); if (error.Fail()) errx(1, "SBProcess::Detach failed: %s", error.GetCString()); } lldb::SBDebugger::Terminate(); if (kill(pid, SIGKILL) != 0) err(1, "kill"); if (waitpid(pid, NULL, 0) < 0) err(1, "waitpid"); return 0; } Run it in a loop like this: $ while ./test-attach ; do date; done On Linux x86-64 (Fedora 29), with LLDB 7 (lldb-7.0.0-1.fc29.x86_64) and kernel 4.19.12 (kernel-4.19.12-301.fc29.x86_64), after 100 iterations or so, attaching to the newly created process fails: test-attach: SBTarget::Attach failed: lost connection This also reproduces occasionally with LLDB itself (with “lldb -p PID”). Any suggestions how to get more information about the cause of this error? Thanks, Florian ___ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] Unreliable process attach on Linux
* Jan Kratochvil: > On Fri, 04 Jan 2019 17:38:42 +0100, Florian Weimer via lldb-dev wrote: >> Run it in a loop like this: >> >> $ while ./test-attach ; do date; done >> >> On Linux x86-64 (Fedora 29), with LLDB 7 (lldb-7.0.0-1.fc29.x86_64) and >> kernel 4.19.12 (kernel-4.19.12-301.fc29.x86_64), after 100 iterations or >> so, attaching to the newly created process fails: >> >> test-attach: SBTarget::Attach failed: lost connection > > FYI after 3 runs it still runs fine with your reproducer both with system > lldb-devel-7.0.0-1.fc29.x86_64 and COPR > lldb-experimental-devel-8.0.0-0.20190102snap0.fc29.x86_64 (=trunk), part > running without /usr/lib/debug and part with. Well, that's odd. Shall I try to reproduce this on a lab machine? > Fedora 29 x86_64 + kernel-4.19.10-300.fc29.x86_64 > > (I haven't investigated the code why it could fail this way.) First, I want to get more logging data out of LLDB. Maybe this will tell us where things go wrong. Thanks, Florian ___ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] Unreliable process attach on Linux
* Zachary Turner: > I'd be curious to see if the PID of the process that is failed to > attach to is the same as one of the PIDs of a process that was > previously attached to (and if so, if it is the first such case where > a PID is recycled). I added logging of the PID, and got this (the failure happened rather quickly this time): $ while ./test-attach ; do date ; done PID: 16658 Sat Jan 5 21:28:28 CET 2019 PID: 16831 Sat Jan 5 21:28:29 CET 2019 PID: 17006 Sat Jan 5 21:28:30 CET 2019 PID: 17176 Sat Jan 5 21:28:30 CET 2019 PID: 17351 Sat Jan 5 21:28:31 CET 2019 PID: 17526 Sat Jan 5 21:28:32 CET 2019 PID: 17700 test-attach: SBTarget::Attach failed: lost connection So there isn't any evidence of PID reuse. My lldb -p test case also triggers this with a long-running process, so there isn't any PID reuse there either. Thanks, Florian ___ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] Unreliable process attach on Linux
* Pavel Labath: > On 04/01/2019 17:38, Florian Weimer via lldb-dev wrote: >> Consider this example program: >> >> #include >> #include >> #include >> >> #include >> #include >> #include >> >> int >> main(void) >> { >>// Target process for the debugger. >>pid_t pid = fork(); >>if (pid < 0) >> err(1, "fork"); >>if (pid == 0) >> while (true) >>pause(); >> >>lldb::SBDebugger::Initialize(); >>{ >> auto debugger(lldb::SBDebugger::Create()); >> if (!debugger.IsValid()) >>errx(1, "SBDebugger::Create failed"); >> >> auto target(debugger.CreateTarget(nullptr)); >> if (!target.IsValid()) >>errx(1, "SBDebugger::CreateTarget failed"); >> >> lldb::SBAttachInfo attachinfo(pid); >> lldb::SBError error; >> auto process(target.Attach(attachinfo, error)); >> if (!process.IsValid()) >>errx(1, "SBTarget::Attach failed: %s", error.GetCString()); >> error = process.Detach(); >> if (error.Fail()) >>errx(1, "SBProcess::Detach failed: %s", error.GetCString()); >>} >>lldb::SBDebugger::Terminate(); >> >>if (kill(pid, SIGKILL) != 0) >> err(1, "kill"); >>if (waitpid(pid, NULL, 0) < 0) >> err(1, "waitpid"); >> >>return 0; >> } >> >> Run it in a loop like this: >> >> $ while ./test-attach ; do date; done >> >> On Linux x86-64 (Fedora 29), with LLDB 7 (lldb-7.0.0-1.fc29.x86_64) and >> kernel 4.19.12 (kernel-4.19.12-301.fc29.x86_64), after 100 iterations or >> so, attaching to the newly created process fails: >> >> test-attach: SBTarget::Attach failed: lost connection >> >> This also reproduces occasionally with LLDB itself (with “lldb -p PID”). >> >> Any suggestions how to get more information about the cause of this >> error? >> > > I would recommend enabling gdb-remote logging (so something like: > debugger.HandleCommand("log enable gdb-remote packets")) to see at > which stage do we actually lose the gdb-server connection. Thanks. I enabled logging like this: auto debugger(lldb::SBDebugger::Create()); if (!debugger.IsValid()) errx(1, "SBDebugger::Create failed"); debugger.HandleCommand("log enable gdb-remote packets"); auto target(debugger.CreateTarget(nullptr)); if (!target.IsValid()) errx(1, "SBDebugger::CreateTarget failed"); And here's the output I get: test-attach < 1> send packet: + test-attach history[1] tid=0x1cab < 1> send packet: + test-attach < 19> send packet: $QStartNoAckMode#b0 test-attach < 1> read packet: + test-attach < 6> read packet: $OK#9a test-attach < 1> send packet: + test-attach < 41> send packet: $qSupported:xmlRegisters=i386,arm,mips#12 test-attach < 124> read packet: $PacketSize=2;QStartNoAckMode+;QThreadSuffixSupported+;QListThreadsInStopReply+;qEcho+;QPassSignals+;qXfer:auxv:read+#be test-attach < 26> send packet: $QThreadSuffixSupported#e4 test-attach < 6> read packet: $OK#9a test-attach < 27> send packet: $QListThreadsInStopReply#21 test-attach < 6> read packet: $OK#9a test-attach < 13> send packet: $qHostInfo#9b test-attach < 11> send packet: $qEcho:1#5b test-attach: SBTarget::Attach failed: lost connection Florian ___ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] Unreliable process attach on Linux
* Pavel Labath: > Thanks. I think this is what I suspected. The server is extremely slow > in responding to the qHostInfo packet. This timeout for this was > recently increased to 10 seconds, but it looks like 7.0 still has the > default (1 second) timeout. > > If you don't want to recompile or update, you should be able to work > around this by increasing the default timeout with the following > command "settings set plugin.process.gdb-remote.packet-timeout 10". I see, that helps. There's a host name in the qHostInfo response? Where's the code that determines the host name? On the other end? I wonder if it performs a DNS lookup. That could explain the delay. Thanks, Florian ___ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] Unreliable process attach on Linux
* Pavel Labath: > Yes, there's a dns lookup being done on the other end. TBH, I'm not > really sure what's it being used for. Maybe we should try deleting the > hostname field from the qHostInfo response (or just put an IP address > there). Or use the system host name without resorting to DNS (using uname or gethostname on GNU/Linux). The DNS lookup is really surprising. Thanks, Florian ___ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
[lldb-dev] Access to TLS variables on GNU/Linux
I'm trying to access thread-local variables using the API on GNU/Linux. Here's my test program: #include #include #include #include #include #include thread_local int global_tls_variable __attribute__ ((tls_model ("initial-exec")))= 17; int main(void) { // Target process for the debugger. pid_t pid = fork(); if (pid < 0) err(1, "fork"); if (pid == 0) while (true) pause(); lldb::SBDebugger::Initialize(); { lldb::SBDebugger debugger{lldb::SBDebugger::Create()}; if (!debugger.IsValid()) errx(1, "SBDebugger::Create failed"); lldb::SBTarget target{debugger.CreateTarget(nullptr)}; if (!target.IsValid()) errx(1, "SBDebugger::CreateTarget failed"); lldb::SBAttachInfo attachinfo(pid); lldb::SBError error; lldb::SBProcess process{target.Attach(attachinfo, error)}; if (!process.IsValid()) errx(1, "SBTarget::Attach failed: %s", error.GetCString()); lldb::SBValue value{target.FindFirstGlobalVariable("global_tls_variable")}; if (!value.IsValid()) errx(1, "SBTarget::FindFirstGlobalVariable: %s", value.GetError().GetCString()); printf("global_tls_variable (LLDB): %d\n", (int) value.GetValueAsSigned()); printf("value type: %d\n", (int) value.GetValueType()); } lldb::SBDebugger::Terminate(); if (kill(pid, SIGKILL) != 0) err(1, "kill"); if (waitpid(pid, NULL, 0) < 0) err(1, "waitpid"); return 0; } It prints: global_tls_variable (LLDB): 0 value type: 4 The target process has loaded libpthread.so.0, so it's not the usual problem of libthread_db not working without libpthread. On the other hand, I realize now that the lldb command cannot access TLS variables, either. Is this expected to work at all? I'm using lldb-7.0.1-1.fc29.x86_64 from Fedora 29 (which is built around GCC 8 and glibc 2.28). Thanks, Florian ___ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
Re: [lldb-dev] Access to TLS variables on GNU/Linux
* Jan Kratochvil: > On Tue, 14 May 2019 13:38:57 +0200, Florian Weimer via lldb-dev wrote: >> The target process has loaded libpthread.so.0, so it's not the usual >> problem of libthread_db not working without libpthread. >> >> On the other hand, I realize now that the lldb command cannot access TLS >> variables, either. Is this expected to work at all? > > TLS is implemented only for FreeBSD as there is > FreeBSDThread::GetThreadPointer() but on Linux it falls back to unimplemented: > lldb::addr_t Thread::GetThreadPointer() { return LLDB_INVALID_ADDRESS; } > > On Linux it uses DynamicLoaderPOSIXDYLD::GetThreadLocalData() which may work > without libthread_db as it is reading "_thread_db_*" symbols in > DYLDRendezvous::GetThreadInfo(). But it needs that GetThreadPointer() which > could get implemented (for x86_64) by reading %fs_base. LLDB currently does > not know anything about %fs_base+%gs_base. > If I can get the TLS base address on x86 for a thread and if LLDB can expose the offset of an initial-exec TLS variable inside the TLS block of an object (which should be encoded in the ELF data), I can poke at glibc internals and figure out the offset from thread pointer. (Global-dynamic TLS is much more difficult to handle, of course.) > Is it a good idea to implement %fs_base+%gs_base to make TLS working > on Linux? The register access would help, I think. Even if the rest doesn't work. If you have an experimental build, I can try it. Thanks, Florian ___ lldb-dev mailing list lldb-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev